Acceleration of the non-hydrostatic dynamical core of RegCM using GPUs
- 1Netherlands eScience Center, Amsterdam, the Netherlands
- 2Abdus Salam International Centre for Theoretical Physics, Trieste, Italy
- 3ATOS Center for Excellence in Performance Programming, Paris, France
Within the ESiWACE-2 project, a work package was dedicated toward providing services to the European earth system modeling community; the primary aspect of these services was the advancement of weather and climate model components towards exascale hardware architectures. As the bulk of this software is MPI-parallelized Fortran code, significant leaps have to be made in design and engineering to utilize the potential of e.g. GPU-equipped supercomputers that constitute the majority of (pre-)exascale systems that will emerge in the near future in Europe. The service was organized in the form of a call for projects, where awarded modeling groups would benefit from a 6 person-month collaboration with HPC experts within the ESiWACE consortium.
One such project has been the regional climate model RegCM, a state of the art limited area model, developed by the Earth System Physics section of the Abdus Salam International Centre for Theoretical Physics (ICTP) for long-term regional climate simulation. RegCM has participated in numerous intercomparison projects and is designed to be a public, open source, user-friendly and portable code that can be applied to any region of the world. The RegCM userbase extends beyond Europe, both toward industrialized countries (e.g. the US) as well as developing nations. Its development iteration has seen the addition of a non-hydrostatic dynamical core which, coupled with model 1D packages solving the sub-grid scale physics of convection, water phase change, boundary layer, short and long wave solar and long wave earth radiation interaction, permit the model a time integration to produce a climate scenario simulation. The model has an internal coupling with a surface community land model for atmosphere surface interaction description (CLM4.5).
Within the ESiWACE-2 service project, we have accelerated this dynamical core to GPUs using the OpenACC programming model. Within the limited timeframe we have adopted three main optimizations: (i) the restructuring of zonal and meridional advection loops to expose full three-dimensional parallelism, (ii) the use of direct GPU-to-GPU communication through device-aware MPI calls, and (iii) the minimization of GPU-CPU exchanges by excluding any data transfers back to the host, except for I/O and physics parameterizations.
Using these programming techniques, we were able to construct a dynamical core for RegCM that runs exclusively on the GPU. For benchmarking, we use the ‘Alps’ test case, a 3km-resolution mesh with ~14M grid columns over the Alpine region, representative of the future convection permitting model configurations. Benchmarks on the JUWELS-Booster supercomputer show an acceleration by more than a factor of two at low node counts (1-3) which diminishes when higher node allocations are used; at 8 nodes, both CPU- and GPU-versions have comparable speed. For the previous-generation system Marconi-100, the accelerated version is observed to be consistently faster by a factor ~2.7.
Looking forward, performance profiles indicate that the GPU-resident code is mostly bound by MPI-communication latency within the advection substepping. Techniques to mitigate these penalties are currently being investigated. Moreover, more fine-grained parallelisation of complex loops, such as tuned tiling instructions, can further improve the performance of the nonhydrostatic dynamical core of RegCM.
How to cite: Sclocco, A., van den Oord, G., Giuliani, G., Girotto, I., Raffin, E., and van Werkhoven, B.: Acceleration of the non-hydrostatic dynamical core of RegCM using GPUs, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-7333, https://doi.org/10.5194/egusphere-egu23-7333, 2023.