GPU-HADVPPM: high-efficient parallel GPU design of the Piecewise Parabolic Method (PPM) for horizontal advection in air quality model (CAMx)
- Beijing Normal University, Institute for Global Change and Earth System Science, Beijing, China (caokai@mail.bnu.edu.cn)
With semiconductor technology gradually approaching its physical and thermal limits, Graphics processing units (GPUs) are becoming an attractive solution in many scientific applications due to their high performance. This paper presents an application of GPU accelerators in air quality model. We endeavor to demonstrate an approach that runs a PPM solver of horizontal advection (HADVPPM) for air quality model CAMx on GPU clusters. Specifically, we first convert the HADVPPM from its original Fortran form to a new Compute Unified Device Architecture C (CUDA C) code to make it computable on the GPU (GPU-HADVPPM). Then, a series of optimization measures are taken, including reducing the CPU-GPU communication frequency, increasing the size of data computation on GPU, and optimizing the GPU memory access order to improve the overall computing performance of CAMx. Finally, a heterogeneous, hybrid programming paradigm (MPI+CUDA) is presented and utilized with the GPU-HADVPPM on GPU clusters. When the consistency of its results is verified, offline experiment results show that running GPU-HADVPPM on one K40 and V100 GPU can achieve up to 845.4x and 1113.6x acceleration. By implementing a series of optimization schemes, the CAMx model coupled with GPU-HADVPPM resulted in a 12.7x and 94.8x improvement in computational efficiency using a GPU accelerator card on a K40 and V100 cluster, respectively. The multi-GPU acceleration algorithm enables 3.9x speedup with 8 CPU cores and 8 GPU accelerators on V100 cluster.
Figure 1. The calling and computation process of the HADVPPM function on the CPU-GPU.
Figure 2. (a) The offline performance of the HADVPPM scheme on CPU and GPU. The unit of the wall times for the offline performance experiments is millisecond(ms); (b) The total elapsed time of CAMx-CUDA V1.3 on multiple GPUs. The unit of elapsed time for experiments is seconds (s). The orange bar indicates the elapsed time of CAMx on the CPU, the blue bar shows the elapsed time on the CPU-GPU heterogeneous platform, and the red line indicates its speedup ratio on the heterogeneous platform.
How to cite: Cao, K. and Wu, Q.: GPU-HADVPPM: high-efficient parallel GPU design of the Piecewise Parabolic Method (PPM) for horizontal advection in air quality model (CAMx), EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-4859, https://doi.org/10.5194/egusphere-egu23-4859, 2023.