Evaluating the forecast skill of machine-learning weather prediction models across a selection of extreme UK windstorms

James Hewitt; Ambrogio Volonté; Ben Harvey; Andressa Andrade Cardoso; Kieran Hunt; Natalie Harvey; Oscar Martinez-Alvarado; Suzanne Gray; Helen Dacre; Kevin Hodges

doi:https://doi.org/10.5194/egusphere-egu26-12760

[Back] [Session NP4.2]

EGU26-12760, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-12760

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Evaluating the forecast skill of machine-learning weather prediction models across a selection of extreme UK windstorms

James Hewitt^1,2, Ambrogio Volonté^1,2, Ben Harvey^1,2, Andressa Andrade Cardoso¹, Kieran Hunt^1,2, Natalie Harvey¹, Oscar Martinez-Alvarado^1,2, Suzanne Gray¹, Helen Dacre¹, and Kevin Hodges¹

James Hewitt et al.

¹Department of Meteorology, University of Reading, Reading, UK
²National Centre for Atmospheric Science, University of Reading, Reading, UK

While numerical weather prediction (NWP) underpins existing early warning systems, its high computational cost limits scalability. Machine-learning weather prediction (MLWP) offers a promising alternative, yet its skill and reliability at forecasting wind extremes and small-scale storm features across different storms remain uncertain. Evaluating the forecast skill of MLWP models across a range of storms is therefore critical before MLWP can be integrated safely into early warning systems.

This study evaluates the performance of eight leading MLWP models at forecasting the peak 10 m and 850 hPa wind speeds, pressure minima, and relative vorticity associated with the most damaging UK windstorms from the 2023/24 winter season: Babet, Ciarán, Debi, Gerrit, Henk and Isha. MLWP models are evaluated against ERA5 and IFS analysis and benchmarked against the NWP IFS ensemble forecast. The results reveal substantial variability in MLWP forecasting skill both between storms and across models.

MLWP forecast skill is found to be linked to the horizontal scale and dynamical nature of the storm feature producing the strongest winds. While wind maxima associated with large-scale conveyor-belt airstreams are generally well predicted, those arising from smaller-scale features, including the cold conveyor belt and sting jets, are underestimated. MLWP model performance is also found to be variable between storms, with no clear best- or worst-performing model. The higher-resolution Aurora-0.1 model is not found to be better at forecasting wind extremes, despite the small spatial scale of the storm features producing the strongest winds in four of the storms analysed.

An in-depth, feature-based analysis is performed for Storms Henk and Isha. Henk proved challenging for both MLWP and NWP models to forecast, resulting in short-notice and inaccurate wind alerts from the Met Office. The MLWP models performed worst for Isha overall, despite the NWP models predicting it well. Across both storms, MLWP models struggled to predict small-scale features associated with extreme winds and tended to smooth sharp frontal gradients.

These results highlight critical limitations in existing MLWP models that make them unsuitable for replacing NWP as a primary forecasting tool for hazardous UK windstorms today. However, current MLWP models could provide rapid, low-cost ensemble information that complements traditional NWP outputs, or serve as a part of a hybrid ML-NWP approach, particularly if structural limitations in representing fine-scale wind maxima are acknowledged and mitigated.

How to cite: Hewitt, J., Volonté, A., Harvey, B., Andrade Cardoso, A., Hunt, K., Harvey, N., Martinez-Alvarado, O., Gray, S., Dacre, H., and Hodges, K.: Evaluating the forecast skill of machine-learning weather prediction models across a selection of extreme UK windstorms, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12760, https://doi.org/10.5194/egusphere-egu26-12760, 2026.