EGU25-18040, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-18040
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 29 Apr, 10:45–12:30 (CEST), Display time Tuesday, 29 Apr, 08:30–12:30
 
Hall X4, X4.12
Workflows for numerical reproducibility in the OceanVar data assimilation model
Francesco Carere1, Francesca Mele1, Italo Epicoco1,2, Mario Adani1, Paolo Oddo1,3, Eric Jansen1, Andrea Cipollone1, and Ali Aydogdu1
Francesco Carere et al.
  • 1Euro-Mediterranean Center on Climate Change, Lecce, Italy
  • 2Department of Engineering for Innovation, University of Salento, Lecce, Italy
  • 3Department of Physics and Astronomy, Bologna University, Bologna, Italy

Numerical reproducibility is a crucial yet often overlooked challenge in ensuring the credibility of computational results and the validity of Earth system models. In large-scale, massively parallel simulations, achieving numerical reproducibility is complicated by factors such as heterogeneous HPC architectures, floating point intricacies, complex hardware/software dependencies, and the non-deterministic nature of parallel execution.

This work addresses the challenges of debugging and ensuring bitwise reproducibility (BR) in parallel simulations, specifically for the MPI-parallelised OceanVar data assimilation model. We explore methods for detecting and resolving BR-related bugs, focusing on an automated debugging process. Currently mature tools to automate this process are lacking for bugs due to MPI-parallelisation, making automatic BR verification in scientific workflows involving such codebases a time-consuming challenge.

However, BR is sometimes considered unrealistic in workflows involving heterogeneous computing architectures. As an alternative, statistical reproducibility (SR) is proposed and explored by various research groups in the Earth system modelling community, for which automated tools have been developed. For example, the scientific workflow of CESM supports automatic verification of SR using the CESM-ECT framework/PyCECT software. In case of failure of SR a root-cause analysis tool exists, CESM-RUANDA, albeit currently not fully functional. We explore SR as an alternative and complementary approach to of BR focusing on its potential to support numerical reproducibility in workflows involving heterogeneous computing architectures.

How to cite: Carere, F., Mele, F., Epicoco, I., Adani, M., Oddo, P., Jansen, E., Cipollone, A., and Aydogdu, A.: Workflows for numerical reproducibility in the OceanVar data assimilation model, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18040, https://doi.org/10.5194/egusphere-egu25-18040, 2025.