EGU23-12858, updated on 26 Feb 2023
https://doi.org/10.5194/egusphere-egu23-12858
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Performance evaluation of NEMO4.2 with Paraver

Francesca Mele1, Italo Epicoco1,2, Silvia Mocavero1, and Jesus Labarta3
Francesca Mele et al.
  • 1Euro-Mediterranean Centre on Climate Change, Foundation, Italy
  • 2University of Salento, Dep. Engineering for Innovation, Lecce, Italy
  • 3Barcelona Supercomputing Center, Spain

The last release of the NEMO v4.2 ocean model includes many modifications that have a significant impact on the model performance. The goal of the work is to assess NEMO performance obtained due to the optimizations carried out during the last four years within the IMMERSE and IS-ENES3 projects. The computational analysis was conducted using Extrae and Paraver which are the performance tools developed at the Barcelona Supercomputing Center.

Extrae provides a trace rich of information regarding the usage of the computational resources made by the model, these include measurements related to the memory subsystem, instruction cycles, vectorization level, communications among parallel processes and many others. Paraver provides a visual inspection of the trace and an insight of the computational features of the NEMO model; this allows to define easily a detailed quantitative evaluation of performance issues.

The performance analysis carried out on NEMO is based on the evaluation of different metrics each one related to a different aspect of the computational resource. The main aspects analyzed are the execution time, the communication time, the number of instructions per cycle and the cache hit rate. In addition, we combined these metrics to evaluate the parallel scalability and the global efficiency of the model when the number of core increases.

Our investigation was focused on evaluating the impact of the last HPC changes and namely: the use of collective neighbors communication pattern, available in MPI3, for the halo exchange; the use of the loop fusion technique to improve the data locality; the impact of the extended halo; the impact of the MPI+OpenMP version of NEMO obtained by means of PSyclone which is a DSL compiler developed at the STFC.

The analysis has been carried out on MareNostrum4 supercomputer at BSC with the NEMO source code available @commit 1d9676ff (a.k.a 68-summer-body-2022 branch) and using the Bench Test configured for ORCA12-like resolution. The evaluation of the MPI+OpenMP was carried out using NEMO 4.0 in ORCA025 configuration kindly provided by STFC as outcome of the PSyclone DSL compiler.

The use of the extended halo with 2 points provides a significant improvement on the performance with a factor of 13% due to a reduction of the number of exchanged messages.

The use of MPI3 communications does not introduce many benefits: a lower number of MPI point-to-point exchanges is compensated by the higher message size of MPI3 neighbors collective communications.

The use of loop fusion does not introduce many benefits: few routines with loop fusion and the little improvement registered in cache misses is compensated by the increase in the number of instructions due to the fusion of the loops.

The analysis of the traces on the hybrid MPI/OpenMP NEMO version processed by Psyclone doesn’t highlight many benefits when the number of OpenMP threads increases due to the part of the code not parallelized.

Finally, one of the most important HPC development, the tiling, has not been analyzed yet, since the last modifications have been merged recently and the resulting code is still under revision.

How to cite: Mele, F., Epicoco, I., Mocavero, S., and Labarta, J.: Performance evaluation of NEMO4.2 with Paraver, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-12858, https://doi.org/10.5194/egusphere-egu23-12858, 2023.