- 1Bern University of Applied Sciences, School of Engineering and Computer Science, Biel, Switzerland
- 2University of Lausanne, Lausanne, Switzerland
- 3ETH Zurich, Zurich, Switzerland
- 4DXT Commodities, Lugano, Switzerland
- 5TU Delft, Delft, Netherlands
Subseasonal weather forecasts (2 weeks to 2 months ahead) inform operational planning in many societally relevant sectors, including energy supply and demand, but their predictive skill varies widely. We propose machine learning (ML) as a computationally inexpensive tool to estimate forecast skill in advance, aiding decision-makers. In our study, an ML model learns to relate the forecast initial conditions in historical weather data to the probabilistic forecast error at subseasonal lead times. Explainability techniques further let us rank the sources of subseasonal predictability in hindcast data by their importance, a first to our knowledge.
A gradient boosted decision tree model is trained to predict the Continuous Ranked Probability Score (CRPS) of ECMWF hindcasts at lead times 0-46 days, by leveraging initial conditions (geopotential height, sea surface temperature, zonal wind speed) extracted from the Earth System Reanalysis 5 (ERA5). The ERA5 data undergo dimensionality reduction (e.g., principal component analysis) before being fed to the ML model, and are supplemented with pre-computed indices like the El Niño-Southern Oscillation Index. Forecast skill is computed for the 500 hPa geopotential height field in Europe against ERA5 ground truth.
The ML model outperforms a climatological baseline (averaged CRPS by calendar date and lead time) in predicting European forecast skill out to week 7. We find the most important predictor of skill is stratospheric polar vortex strength, in addition to lead time and calendar date. Training separate models by lead time reveals clear differences in feature importance, such that lead time contributes the most predictability in the first 2 weeks, while the seasonal cycle manifests strongly in weeks 3-4. Different teleconnections become important at different lead times, but their predictive potential also fluctuates throughout the year. We will provide an in-depth breakdown of the feature importances by lead time and season in our presentation.
In conclusion, machine learning provides a novel way to estimate a priori the forecast skill of numerical weather prediction models. The presented method enables us for the first time to rank the relative contributions of the sources of forecast skill, as deduced from hindcast data, thereby advancing our understanding of subseasonal predictability.
How to cite: Mârza, A.-C., Domeisen, D. I. V., Ramella-Pralungo, L., and Meyer, A.: Unraveling the sources of subseasonal predictability with machine learning, EMS Annual Meeting 2025, Ljubljana, Slovenia, 7–12 Sep 2025, EMS2025-160, https://doi.org/10.5194/ems2025-160, 2025.