Global Evaluation of Probabilistic AI Weather Forecasts Across Extremes and Regimes

Marc Girona-Mata; Andrew Orr; Richard Turner

doi:https://doi.org/10.5194/egusphere-egu26-19650

[Back] [Session AS5.1]

EGU26-19650, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-19650

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Global Evaluation of Probabilistic AI Weather Forecasts Across Extremes and Regimes

Marc Girona-Mata^1,2, Andrew Orr², and Richard Turner¹

Marc Girona-Mata et al.

¹Department of Engineering, University of Cambridge, Cambridge, United Kingdom (mg963@cam.ac.uk)
²British Antarctic Survey, Cambridge, United Kingdom

Recent probabilistic machine learning weather forecasting models have demonstrated competitive skill relative to state-of-the-art (SOTA) numerical weather prediction ensemble systems. However, a rigorous global assessment of their skill, particularly in the distribution tails relevant for extremes as well as across different geographical regions, remains limited. Here, we present a systematic evaluation of various SOTA probabilistic AI weather forecasting systems against ECMWF’s Integrated Forecasting System Ensemble (IFS ENS), focusing on forecast skill across the full range of event intensities.

We analyse global forecasts at 24- and 72-hour lead times for near-surface temperature, 10 m wind speed, and total precipitation at 0.25° resolution over the 2024-2025 period. Forecasts are evaluated using the fair Continuous Ranked Probability Score (fCRPS) to account for differing ensemble sizes, as well as other complementary metrics. We also employ the threshold-weighted CRPS (twCRPS) computed for different quantiles ranging from the median up to the one-in-a-million extreme event. Scores are area-weighted and analysed both i) globally, ii) over land only, and iii) for different regions.

AI-based forecasts demonstrate comparable or improved probabilistic skill relative to the IFS ensemble in the bulk of the distribution, with particularly strong performance over tropical and mid-latitude oceans. However, skill systematically degrades at high quantiles for most variables, with more pronounced losses over land and at short lead times. Both diffusion- and CRPS-based probabilistic forecasts are competitive, but their relative skill varies across variables. Spatial diagnostics reveal coherent regime-dependent behaviour, with AI models underperforming in complex terrain and coastal regions where the IFS ENS retains a clear advantage.

These results highlight both the promise and current limitations of probabilistic AI weather forecasting models, emphasising that headline global skill can mask substantial degradation in extreme-event and regional reliability.

How to cite: Girona-Mata, M., Orr, A., and Turner, R.: Global Evaluation of Probabilistic AI Weather Forecasts Across Extremes and Regimes, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19650, https://doi.org/10.5194/egusphere-egu26-19650, 2026.

OSPP voting tool

This contribution takes part in the OSPP contest. Please log in to see the relevant judging section.