Probabilistic Benchmarks and Post-Processing for Data-Driven Weather Forecasting

Tobias Biegert; Nils Koster; Sebastian Lerch

doi:https://doi.org/10.5194/egusphere-egu26-17711

[Back] [Session NP5.1]

EGU26-17711, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-17711

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Probabilistic Benchmarks and Post-Processing for Data-Driven Weather Forecasting

Tobias Biegert¹, Nils Koster¹, and Sebastian Lerch^2,3

Tobias Biegert et al.

¹Institute of Statistics, Karlsruhe Institute of Technology, Karlsruhe, Germany (tobias.biegert@kit.edu)
²Department of Mathematics and Computer Science, Marburg University, Marburg, Germany
³Heidelberg Institute for Theoretical Studies, Heidelberg, Germany

In recent years, significant progress in machine learning technologies has enabled the development of various artificial intelligence weather prediction (AIWP) models, approaching, or even surpassing the skill of numerical weather prediction (NWP) models.

However, despite these advancements, several important questions remain open. Most data-driven models primarily focus on deterministic point forecasts and lack the capability to generate probabilistic predictions, which, however, is crucial for optimal decision making and quantifying weather risk in applications. Further, while it has been widely demonstrated that physics-based NWP models substantially benefit from post-processing methods, which aim to correct systematic errors, the use of post-processing for data-driven weather models has not been explored in detail.

Our overarching aim thus is to investigate the application of various post-processing techniques to potentially improve predictions, as well as to generate probabilistic forecasts from deterministic AIWP as well as NWP model outputs. We assess whether AI-based weather models benefit from post-processing to a similar extent as physics-based NWP, enabling a fair comparison between post-processed AIWP and NWP forecasts. The resulting post-processed AIWP forecasts also yield a relatively simple probabilistic benchmark for evaluating whether inherently probabilistic AIWP models deliver commensurate skill improvements given their increased computational cost.

Experiments are based on the WeatherBench 2 framework, which provides a standardized archive of prominent AIWP as well as operational NWP model outputs. Specifically, we apply a suite of established statistical and machine learning post-processing methods to model outputs for the eight variables defined as headline scores (Z500, T850, Q700, WV850, T2M, WS10, MSLP, TP24hr) in the WeatherBench 2 framework, and systematically evaluate the effectiveness of these methods for improving deterministic and probabilistic forecasts.

Results show that post-processed probabilistic forecasts can outperform the ensemble predictions from the European Centre for Medium-Range Weather Forecasts for shorter lead times of up to one week for selected variables, but the results vary across variables, lead times, post-processing methods and forecasting models.

How to cite: Biegert, T., Koster, N., and Lerch, S.: Probabilistic Benchmarks and Post-Processing for Data-Driven Weather Forecasting, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17711, https://doi.org/10.5194/egusphere-egu26-17711, 2026.