EMS Annual Meeting Abstracts
Vol. 22, EMS2025-183, 2025, updated on 30 Jun 2025
https://doi.org/10.5194/ems2025-183
EMS Annual Meeting 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
EUMETNET E-AI Data Curation working group activities
Roope Tervo1, Arianna Valmassoi2, Stephan Siemen3, and Marek Jacob2
Roope Tervo et al.
  • 1EUMETSAT, User and Climate Services, Darmstadt, Germany
  • 2Deutsche Wetterdienst DWD, Offenbach am Main, Germany
  • 3ECMWF, Reading, United Kingdom

The "Artificial Intelligence and Machine Learning for Weather, Climate, and Environmental Applications" (E-AI) Optional Programme

The E-AI is a strategic initiative adopted by the EUMETNET General Assembly and is set to run for a period of five years. It aims to strengthen collaboration among European National Meteorological and Hydrological Services (NMHSs) and external partners, with a specific focus on AI and ML in weather, climate, and environmental domains. A key objective is to disseminate the advancements achieved under the E-AI programme through widely accepted permissive open-source licenses, fostering a culture of openness and collaboration. The Deutsche Wetterdienst (DWD) has been designated as the Coordinating Member of E-AI, responsible for overseeing the programme's progress and ensuring alignment with its foundational goals. E-AI is structured around several working groups, covering topics such as data curation, large language models, nowcasting, local area modelling, ethics, regional downscaling, and more.

The work began in late 2024, with an initial focus on producing a gap analysis of available and missing data for AI/ML applications relevant to E-AI. This analysis compiles AI/ML use case descriptions, including information on data used and missing, data preparation processes, and shortcomings in available data (such as insufficient quality, resolution, irregular time steps, etc.), as well as challenges encountered during development. Following this, the use cases will be mapped to available data sources and tools to derive a comprehensive gap analysis of the current ecosystem deficiencies. The goal is to complete the first iteration of this analysis by the end of 2025.

In addition to the gap analysis, the Data Curation Working Group is also exploring best practices for using Zarr in AI/ML applications. The group is collecting experiences to develop a "Zarr Best Practices" document, which may eventually evolve into a Zarr profile tailored for E-AI applications, aligned with broader standardization efforts such as GeoZarr [1].

Future plans include the development of a catalogue and mechanisms for data exchange within the E-AI framework, as well as the collection of examples and tutorials related to data preparation and pipelines. While the focus is strongly on E-AI applications, the work is conducted publicly via the https://github.com/eumetnet-e-ai/wg1_data_curation.

[1] https://github.com/zarr-developers/geozarr-spec

How to cite: Tervo, R., Valmassoi, A., Siemen, S., and Jacob, M.: EUMETNET E-AI Data Curation working group activities, EMS Annual Meeting 2025, Ljubljana, Slovenia, 7–12 Sep 2025, EMS2025-183, https://doi.org/10.5194/ems2025-183, 2025.

Recorded presentation

Show EMS2025-183 recording (15min) recording