Solving the lack of data issue for machine learning for rare climate events

Amaury Lancelin; Freddy Bouchet; Alexander Wikner; Pedram Hassanzadeh; Laurent Dubus; Peter Werner

doi:https://doi.org/10.5194/egusphere-egu26-20131

[Back] [Session NP4.2]

EGU26-20131, updated on 01 May 2026

https://doi.org/10.5194/egusphere-egu26-20131

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Solving the lack of data issue for machine learning for rare climate events

Amaury Lancelin^1,2, Freddy Bouchet¹, Alexander Wikner³, Pedram Hassanzadeh³, Laurent Dubus², and Peter Werner¹

Amaury Lancelin et al.

¹LMD/IPSL, CNRS, ENS, Université PSL, École Polytechnique, Institut Polytechnique de Paris, Sorbonne Université, Paris, France
²Réseau de Transport d'Électricité (RTE), Paris, France
³Department of the Geophysical Sciences, University of Chicago, Chicago, IL, USA

Machine learning is reshaping the entire climate-modelling chain, from climate model development to the study of climate extreme events and their impacts. One of the key drivers of this revolution is the availability of datasets that are sufficiently large for training and validation. For climate extreme events, however, this requirement poses seemingly insurmountable challenges: we need to assess the impacts of unprecedented events for which historical data are too scarce; we must rely on models, yet simulating extremely rare events with them is prohibitively expensive; and any statistical approach, including machine learning, suffers from a severe lack-of-data problem.

Here, we argue that the only viable path forward is to integrate machine learning directly into the data-generation process, in close interaction with state-of-the-art physics-based climate models and observational datasets.

The first building block of our approach is the development of state-of-the-art climate model emulators. AI models trained on historical reanalyses to emulate the dynamics of the global atmosphere have demonstrated both high forecast skill and drastically reduced computational costs. Some of these AI emulators can generate stable trajectories spanning multiple decades, which, combined with their affordability, has the potential to significantly reduce uncertainties related to extreme weather. However, it remains impossible to directly validate whether AI emulators can reliably estimate the risk of extreme events with return times exceeding the historical record. To address this issue, we develop a methodology based on state-of-the-art architectures, with the explicit requirement that emulators exhibit extremely long-term stability, high fidelity, and a faithful reproduction of the stationary statistics of the climate model.

In a first-of-its-kind experiment, we simulate 100,000 years of a stationary climate using PlaSim, a coarse-resolution general circulation model. We then train a set of stable AI emulators using only 100 years of data, and compare the return times of extreme heat waves over Western Europe and the Pacific Northwest, as well as severe precipitation events over the Tropics.

The second building block of our approach consists of rare-event simulation techniques that reduce by several orders of magnitude the computational cost of sampling extremely rare events with CMIP-class climate models. The third building block is the blending of historical observations with CMIP model output within a Bayesian framework to estimate the

probability of extremely rare events constrained by observations. In this talk, we also briefly discuss the second and third building blocks and their connections to the first within a comprehensive, integrated framework.

How to cite: Lancelin, A., Bouchet, F., Wikner, A., Hassanzadeh, P., Dubus, L., and Werner, P.: Solving the lack of data issue for machine learning for rare climate events, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20131, https://doi.org/10.5194/egusphere-egu26-20131, 2026.