Replicability testing and scientific skill quantification in Earth System Models with pyhanami

Marta Alerany Solé; Kai Keller; Chihiro Kodama; Masuo Nakano; Tomoe Nasuno; Daisuke Takasuka; Mario Acosta

doi:https://doi.org/10.5194/egusphere-egu26-6642

[Back] [Session ESSI2.8]

EGU26-6642, updated on 13 Mar 2026

https://doi.org/10.5194/egusphere-egu26-6642

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Replicability testing and scientific skill quantification in Earth System Models with pyhanami

Marta Alerany Solé¹, Kai Keller¹, Chihiro Kodama², Masuo Nakano², Tomoe Nasuno², Daisuke Takasuka³, and Mario Acosta¹

Marta Alerany Solé et al.

¹Barcelona Supercomputing Center, Barcelona, Spain (marta.alerany@bsc.es)
²Japan Agency for Marine-Earth Science and Technology, Yokohama, Japan
³Graduate school of science, Tohoku University, Sendai, Japan

As climate models advance toward higher resolutions, they become increasingly capable of resolving key Earth system processes, which in turn raises the need for robust and quantitative evaluation methods. In response to this challenge, we present pyhanami, an open-source Python package developed within the HANAMI project to assess the replicability and scientific skill of Earth System Models (ESMs) using statistical testing and objective, scalar-based metrics. Besides, to facilitate the practical application of these evaluations, pyhanami features a structured data interface that efficiently loads and inspects compatible model outputs.

An ESM is considered replicable if the same experiment run on different computing environments or with different compilers produces identical results, i.e., representing the same climate. This ensures that differences between simulations reflect only the intended scientific changes in the model setup. Because bit-for-bit replicability is often unattainable across environments due to the chaotic nature of climate models, our practical goal is to achieve statistical indistinguishability. Building on existing methodologies, pyhanami provides an ensemble-based replicability test that combines multiple statistical tests and metrics to determine whether two simulated ensembles are statistically indistinguishable, as described in (K.Keller et al., 2025; doi.org/10.5194/gmd-18-10221-2025). To the best of our knowledge, automated and standardized replicability assessment is not currently supported in model evaluation tools, despite its importance for climate model development, validation, intercomparison, and porting.

Complementing replicability, the scientific skill of an ESM describes its ability to accurately reproduce observed features of the climate system, from regional patterns to large-scale teleconnections. Many existing tools to evaluate this skill rely on visualization-based diagnostics, which often require expert knowledge and can be biased by subjective interpretation. In contrast, scalar metrics and scores provide quantitative and comparable measures of scientific skill, which are essential for interpreting climate projections, guiding model development, and model intercomparison. However, diagnostics for physical processes that require km-scale, high-resolution global climate models to be properly resolved remain underrepresented in state-of-the-art diagnostic suites. Although several metrics have been proposed for such small-scale processes, many lack standardized and widely available implementations. As high-resolution climate simulations become more common, the demand for objective diagnostics to support model tuning and improvement is increasing. pyhanami addresses this need by providing a growing set of scalar scientific skill metrics that enable quantitative and easily interpretable evaluation of phenomena such as Tropical Cyclones and the Tropical Intraseasonal Oscillation (ISO), including the Madden-Julian Oscillation and the Boreal Summer ISO modes.

By integrating replicability testing, scientific skill metrics, and visualization tools into a single, self-contained package with a generic data interface, pyhanami streamlines evaluation workflows and supports the development of reliable climate projections, advancing the quality and reproducibility of geosciences research.

How to cite: Alerany Solé, M., Keller, K., Kodama, C., Nakano, M., Nasuno, T., Takasuka, D., and Acosta, M.: Replicability testing and scientific skill quantification in Earth System Models with pyhanami, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6642, https://doi.org/10.5194/egusphere-egu26-6642, 2026.

Supplementary materials

Supplementary material link Supplementary material file

Comments on the supplementary material

AC: Author Comment | CC: Community Comment | Report abuse

supplementary materials version 1 – uploaded on 04 May 2026, no comments