Comparison of CPU and GPU parallelization approaches between two programming languages in copepod model simulations
- 1University of Ruhuna, Faculty of Fisheries and Marine Sciences and Technology, Department of Oceanography and Marine Geology, Matara, Sri Lanka (varshanibrabaharan@gmail.com)
- 2University of Ruhuna, Faculty of Fisheries and Marine Sciences and Technology, Department of Oceanography and Marine Geology, Matara, Sri Lanka (sachithma99@gmail.com)
- 3University of Tromsø, Department of Arctic and Marine Biology, Norway (info@kanchanabandara.com)
This study presents a comparative assessment to evaluate between two high performance computing languages, Java and FORTRAN for the computation vs. communication trade-off observed during a strategy-oriented copepod model simulation. Here we compared the computational time of (i) sequential processing, (ii) latency (CPU) and (iii) throughput (GPU) oriented designs. CPU based parallelization was accomplished on a 4-core Intel i7 processor with a clock speed of 1.99 GHz. On this CPU, we implemented a (i) fork/join framework design based on work-stealing algorithm in Java and (ii) Open Multi- Processing (OpenMP), a directive-based application programming interface (API) with shared memory architecture on FORTRAN 95. The GPU processing power was leveraged using the CUDA framework in Java and OpenACC API on FORTRAN on a NVIDIA GeForce MX230 with 256 unified pipelines. The simulation time for sequential CPU execution was ca. 41% lower in FORTRAN compared to Java (18 s vs. 25 s). Furthermore, the FORTRAN simulation was ca. 43% lower in execution time in latency-oriented CPU design compared to Java (10s vs. 13s). In the simulation regarding GPU-approach with unified memory space accessibility, Java computation consumed ca. 38% less time than FORTRAN (5s vs. 8s). Unlike FORTRAN, Java is purely an object-oriented language and therefore, object handling is not optimized in GNU compliers of FORTRAN. Nevertheless, memory consumption of FORTRAN can be fine-tuned – thereby, decreasing latency unlike in Java. OpenMP API is based on self-consistency, shared memory architecture and its temporary view memory allows threads to cache variables and thereby reduce latency by avoid accessing the memory for each reference of variables unlike the fork/join framework in Java. Furthermore, OpenMP has a thread private memory, which allows efficient synchronization within the code. OpenACC is designed as a high-level platform, which is an independent abstract programming accelerator that offers a pragmatic alternative for accessing GPU programming without much programming effort. Nevertheless, some uses of unified memory space accessibility on NVIDIA GPU’s are better represented in CUDA despite OpenACC having a cache directive. Therefore, its best to investigate the performances of different accelerator models and different programming languages depending on the simulation needs and efficiency targets desired by the model.
Keywords: FORTRAN, Java, OpenMP, OpenACC, high-performance computing, copepods, modelling
How to cite: Brabaharan, V., Edirisinghe, S., and Bandara, K.: Comparison of CPU and GPU parallelization approaches between two programming languages in copepod model simulations, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-16817, https://doi.org/10.5194/egusphere-egu23-16817, 2023.