EGU26-19650, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-19650
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Thursday, 07 May, 08:30–10:15 (CEST), Display time Thursday, 07 May, 08:30–12:30
 
Hall X5, X5.219
Global Evaluation of Probabilistic AI Weather Forecasts Across Extremes and Regimes
Marc Girona-Mata1,2, Andrew Orr2, and Richard Turner1
Marc Girona-Mata et al.
  • 1Department of Engineering, University of Cambridge, Cambridge, United Kingdom (mg963@cam.ac.uk)
  • 2British Antarctic Survey, Cambridge, United Kingdom

Recent probabilistic machine learning weather forecasting models have demonstrated competitive skill relative to state-of-the-art (SOTA) numerical weather prediction ensemble systems. However, a rigorous global assessment of their skill, particularly in the distribution tails relevant for extremes as well as across different geographical regions, remains limited. Here, we present a systematic evaluation of various SOTA probabilistic AI weather forecasting systems against ECMWF’s Integrated Forecasting System Ensemble (IFS ENS), focusing on forecast skill across the full range of event intensities.

We analyse global forecasts at 24- and 72-hour lead times for near-surface temperature, 10 m wind speed, and total precipitation at 0.25° resolution over the 2024-2025 period. Forecasts are evaluated using the fair Continuous Ranked Probability Score (fCRPS) to account for differing ensemble sizes, as well as other complementary metrics. We also employ the threshold-weighted CRPS (twCRPS) computed for different quantiles ranging from the median up to the one-in-a-million extreme event. Scores are area-weighted and analysed both i) globally, ii) over land only, and iii) for different regions.

AI-based forecasts demonstrate comparable or improved probabilistic skill relative to the IFS ensemble in the bulk of the distribution, with particularly strong performance over tropical and mid-latitude oceans. However, skill systematically degrades at high quantiles for most variables, with more pronounced losses over land and at short lead times. Both diffusion- and CRPS-based probabilistic forecasts are competitive, but their relative skill varies across variables. Spatial diagnostics reveal coherent regime-dependent behaviour, with AI models underperforming in complex terrain and coastal regions where the IFS ENS retains a clear advantage. 

These results highlight both the promise and current limitations of probabilistic AI weather forecasting models, emphasising that headline global skill can mask substantial degradation in extreme-event and regional reliability.

How to cite: Girona-Mata, M., Orr, A., and Turner, R.: Global Evaluation of Probabilistic AI Weather Forecasts Across Extremes and Regimes, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19650, https://doi.org/10.5194/egusphere-egu26-19650, 2026.