Assessing the physical realism of AI-based weather forecasts: insights from extratropical storms and large-scale flow diagnostics.

Soufiane Karmouche; Linus Magnusson; Tim Hewson; Thomas Haiden

doi:https://doi.org/10.5194/egusphere-egu26-11765

[Back] [Session NP5.1]

EGU26-11765, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-11765

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Assessing the physical realism of AI-based weather forecasts: insights from extratropical storms and large-scale flow diagnostics.

Soufiane Karmouche, Linus Magnusson, Tim Hewson, and Thomas Haiden

Soufiane Karmouche et al.

European Centre for Medium-Range Weather Forecasts (soufiane.karmouche@ecmwf.int)

Standard scores such as the root mean squared error provide limited insight into whether Machine-learning (ML) weather prediction systems reproduce the physically consistent dynamical structures that underpin high-impact weather. Here, we present a multi-faceted assessment of the physical realism of ECMWF’s Artificial Intelligence Forecasting System (AIFS), combining case-study diagnostics of severe extratropical storms with conditional verification based on large-scale circulation.

We first examine two North Atlantic storms: Storm Amy (October 2025) and Storm Eowyn (January 2025). Using diagnostics inspired by Charlton-Perez et al. (2024), we analyse frontal structure, vorticity, and surface and upper-air wind fields in AIFS-Single and AIFS Ensemble Control forecasts, benchmarked against the IFS Control and analysis. While ML systems capture storm tracks and large-scale frontal geometry well, they systematically smooth sharp gradients, compact vorticity cores, and localized wind maxima, leading to underestimation of extreme winds. Probabilistic training in the ensemble configuration improves realism but does not fully overcome these structural limitations.

We then present ongoing work assessing the physical consistency of ML forecasts using diagnostics of the ageostrophic-to-geostrophic wind ratio at multiple pressure levels. These reveal systematic differences between ML-based and physics-based models, particularly in dynamically active midlatitude regions.

Finally, we present regime-based verification results highlighting improved AIFS performance for 2-m temperature forecasts during persistent wintertime anticyclonic conditions, illustrating ML strengths in stable large-scale regimes where physics-based forecasts suffer from long-standing systematic biases.

Overall, our results highlight the importance of moving beyond general verification scores toward diagnostic and physically interpretable evaluation frameworks when assessing AI-based weather forecasts, especially for high-impact weather events.

This work is funded by the Destination Earth project.

REFERENCES:

Charlton-Perez, A.J., Dacre, H.F., Driscoll, S. et al. Do AI models produce better weather forecasts than physics-based models? A quantitative evaluation case study of Storm Ciarán. npj Clim Atmos Sci 7, 93 (2024). https://doi.org/10.1038/s41612-024-00638-w

How to cite: Karmouche, S., Magnusson, L., Hewson, T., and Haiden, T.: Assessing the physical realism of AI-based weather forecasts: insights from extratropical storms and large-scale flow diagnostics., EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11765, https://doi.org/10.5194/egusphere-egu26-11765, 2026.