WeGen FastEvaluation: An open-source tool for the evaluation and comparison of machine learning models in weather and climate applications

Ilaria Luise; Savvas Melidonis; Julius Polz; Sorcha Owens; Timothee Hunter; Christian Lessig; Michael Tarnawa

doi:https://doi.org/10.5194/egusphere-egu26-153

[Back] [Session NP5.1]

EGU26-153, updated on 13 Mar 2026

https://doi.org/10.5194/egusphere-egu26-153

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

WeGen FastEvaluation: An open-source tool for the evaluation and comparison of machine learning models in weather and climate applications

Ilaria Luise¹, Savvas Melidonis², Julius Polz³, Sorcha Owens⁴, Timothee Hunter¹, Christian Lessig¹, and Michael Tarnawa²

Ilaria Luise et al.

¹European Center for Medium-Range Weather Forecasts, ECMWF, Bonn, Germany
²Jülich Supercomputing Center, Jülich, Germany
³Karlsruhe Institute of Technology, Karlsruhe, Germany
⁴UK MetOffice, Exeter, United Kingdom

The next generation of machine learning (ML) weather and climate models is increasingly trained on a wide variety of datasets, including reanalyses, forecasts and observations . This diversity can typically not be handled by existing evaluation tools that are often limited to gridded data or fixed lead times Furthermore, many existing evaluation frameworks are developed internally by institutions, remain closed-source, and lack interoperability across platforms and high-performance computing (HPC) environments. This creates a gap in the ability to systematically assess model skill across different data streams, experiments, and computing infrastructures.

The WeGen FastEvaluation tool, developed within the WeatherGenerator project, aims to bridge this gap. It provides a flexible, open-source framework designed to evaluate machine learning–based weather prediction models across a wide range of dataset types and formats. Unlike most existing tools, WeGen FastEvaluation makes minimal assumptions about data structure, allowing consistent analysis of both gridded and unstructured inputs, deterministic and probabilistic outputs, and multiple forecast lead times. Built on xarray, the WeGenFastEvaluation supports multi-dimensional data handling, including probabilistic outputs and ensemble forecasts. The tool enables efficient computation of skill metrics and generation of 2D visualizations, allowing users to compare an arbitrary number of model runs across different data streams and forecast configurations.

The presentation will introduce the design and capabilities of the WeGen FastEvaluation, highlighting its integration within the WeatherGenerator workflow. Through examples, we demonstrate how the WeGen FastEvaluation tool enables consistent benchmarking, collaborative analysis across HPC systems, and reproducible ML-for-weather research.

How to cite: Luise, I., Melidonis, S., Polz, J., Owens, S., Hunter, T., Lessig, C., and Tarnawa, M.: WeGen FastEvaluation: An open-source tool for the evaluation and comparison of machine learning models in weather and climate applications, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-153, https://doi.org/10.5194/egusphere-egu26-153, 2026.