EGU26-13223, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-13223
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Thursday, 07 May, 09:00–09:10 (CEST)
 
Room -2.15
Scale-dependent analysis of the accuracy–activity trade-off in AI weather forecasts
Britta Seegebrecht1, Sabrina Wahl1, Stefanie Hollborn1, Erik Pavel2, Wael Almikaeel2, Michael Langguth2, Martin Schultz2, Christian Lessig3, Ilaria Luise3, Juergen Gall4, Anas Al-Iahham4, and Mohamad Hakam Shams Eddin4
Britta Seegebrecht et al.
  • 1Deutscher Wetterdienst (DWD), Offenbach, Germany (britta.seegebrecht@dwd.de)
  • 2Jülich Supercomputing Centre (JSC), Jülich, Germany
  • 3European Centre for Medium-Range Weather Forecasts (ECMWF), Bonn, Germany
  • 4Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms- Universität Bonn, Bonn, Germany

Data-driven weather prediction models based on artificial intelligence (AI) have rapidly advanced in recent years and are frequently reported to outperform traditional physics-based numerical weather prediction (NWP) models for selected verification scores. However, optimization with respect to a specific loss function can adversely affect other metrics, potentially leading to unrealistic forecast characteristics, such as overly smooth spatial structures when mean-squared or mean-absolute error–based loss functions are used.

A robust and meaningful comparison of AI-based and NWP models therefore requires a carefully chosen and diverse set of verification metrics that accounts for potential dependencies. The main focus is placed on the prominent forecast accuracy-activity tradeoff, associated with the double penalty problem of deterministic forecasts. Related questions include: How sensitive is the relationship between accuracy and activity metrics to the choice of verification measure? Are there systematic differences between AI-based and NWP models? What is the impact of the (in)dependence between the AI training loss function and the verification metrics on the assessment of forecast skill?

These questions are addressed using both scale-independent and scale-dependent verification metrics, allowing the quantification of forecast performance on individual spatial scales.

As a starting point, global deterministic forecasts are considered. The analysis is partly based on forecasts from the Weather Prediction Model Intercomparison Project (WP MIP), which provides a collection of NWP and AI-model forecasts from multiple national weather services and research institutions.

The work is conducted within the RAINA project, which aims to develop a foundation model for the atmosphere with a particular focus on reliable, high-resolution forecasts of extreme wind and precipitation events. Consequently, the relation between, e.g., forecast activity and the predictive capability for extreme weather are of special interest.

How to cite: Seegebrecht, B., Wahl, S., Hollborn, S., Pavel, E., Almikaeel, W., Langguth, M., Schultz, M., Lessig, C., Luise, I., Gall, J., Al-Iahham, A., and Shams Eddin, M. H.: Scale-dependent analysis of the accuracy–activity trade-off in AI weather forecasts, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13223, https://doi.org/10.5194/egusphere-egu26-13223, 2026.