- 1Euro-Mediterranean Center on Climate Change, Lecce, Italy
- 2Department of Engineering for Innovation, University of Salento, Lecce, Italy
- 3Department of Physics and Astronomy, Bologna University, Bologna, Italy
Numerical reproducibility is a crucial yet often overlooked challenge in ensuring the credibility of computational results and the validity of Earth system models. In large-scale, massively parallel simulations, achieving numerical reproducibility is complicated by factors such as heterogeneous HPC architectures, floating point intricacies, complex hardware/software dependencies, and the non-deterministic nature of parallel execution.
This work addresses the challenges of debugging and ensuring bitwise reproducibility (BR) in parallel simulations, specifically for the MPI-parallelised OceanVar data assimilation model. We explore methods for detecting and resolving BR-related bugs, focusing on an automated debugging process. Currently mature tools to automate this process are lacking for bugs due to MPI-parallelisation, making automatic BR verification in scientific workflows involving such codebases a time-consuming challenge.
However, BR is sometimes considered unrealistic in workflows involving heterogeneous computing architectures. As an alternative, statistical reproducibility (SR) is proposed and explored by various research groups in the Earth system modelling community, for which automated tools have been developed. For example, the scientific workflow of CESM supports automatic verification of SR using the CESM-ECT framework/PyCECT software. In case of failure of SR a root-cause analysis tool exists, CESM-RUANDA, albeit currently not fully functional. We explore SR as an alternative and complementary approach to of BR focusing on its potential to support numerical reproducibility in workflows involving heterogeneous computing architectures.
How to cite: Carere, F., Mele, F., Epicoco, I., Adani, M., Oddo, P., Jansen, E., Cipollone, A., and Aydogdu, A.: Workflows for numerical reproducibility in the OceanVar data assimilation model, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18040, https://doi.org/10.5194/egusphere-egu25-18040, 2025.