The recently released suite of AI-based medium-range forecast models can produce multi-day forecasts within seconds, with a skill on par with the IFS model of ECMWF. Traditional model evaluation predominantly targets global scores on single levels. Specific prediction tasks, such as severe convective environments, require much more precision on a local scale and with the correct vertical gradients in between levels. With a focus on the North American and European convective season of 2020, we assess the performance of Panguweather, Graphcast and Fourcastnet for convective available potential energy (CAPE) and storm relative helicity (SRH) at lead times of up to 7 days.
Looking at the example of a US tornado outbreak on April 12 and 13, 2020, all models predict elevated CAPE and SRH values multiple days in advance. The spatial structures in the AI-models are smoothed in comparison to IFS and the reanalysis ERA5. The models show differing biases in the prediction of CAPE values, with Graphcast capturing the value distribution the most accurately and Fourcastnet showing a consistent underestimation.
By advancing the assessment of large AI-models towards process-based evaluations we lay the foundation for hazard-driven applications of AI-weather-forecasts.