- 1Institute of Geography and Oeschger Centre for Climate Change Research, University of Bern, Bern, Switzerland (duncan.pappert@unibe.ch)
- 2Laboratoire des Sciences du Climat et de l’Environnement, CEA-CNRS-UVSQ, Université Paris-Saclay, Gif-sur-Yvette, France
- 3Institute for Environmental Studies, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
- 4Galeio, Paris, France
- 5Mobiliar Lab for Natural Risks, University of Bern, Bern, Switzerland
High summer temperatures place significant stress on human and natural systems, often leading to severe impacts. Summer hot spells vary widely in terms of intensity and duration, yet event duration is often overlooked or considered a secondary aspect when it comes to studying and predicting such extremes. Different sectors in society, the economy, and the environment are vulnerable to extreme heat on different timescales; therefore, knowing the likelihood of a heat event lasting only a few days or surviving over many weeks is crucial for developing more effective adaptation strategies.
In the last decade, machine learning (ML) techniques have increasingly been used to tackle extreme weather forecasting. Among these, Random Forests (RF) have emerged as an effective tool proven to have some skill in predicting the occurrence and mean amplitude of extreme near-surface temperature events. To the best of our knowledge, such statistical models have yet to be used for the purpose of predicting hot spell duration. This study aims to fill that gap.
The objective of this research is to assess whether a random forest (RF) model can predict the duration of a hot spell from its first day. Specifically, we aim to determine if the model can distinguish between short and long durations, covering both synoptic and subseasonal timescales. To achieve this, we develop a statistical model using data from the Community Earth System Model version 2 Large Ensemble (CESM2-LE) historical runs. For two regions in Western Europe, hot spells are defined as periods when the region-averaged deseasonalised and detrended anomalies exceed 1.5 standard deviations. The model is trained with a number of local and remote predictors, incorporating variables from the land, sea, and atmosphere. These features are provided for the days, weeks and months leading up to the event, as well as for the first day of the event itself.
We perform both a RF classification to predict different duration cohorts (short, medium, long) and a Quantile Random Forest (QRF) to model the full conditional distribution of the response variable (event duration). A key challenge is handling a highly imbalanced dataset, with 3-day events far outnumbering events lasting beyond 10 days.
In addition to shedding light on the statistical and dynamical relationships that drive the persistence of hot spells, the results could be relevant for climate adaptation and policy planning.
How to cite: Pappert, D., Vrac, M., Coumou, D., Tuel, A., and Martius, O.: Predicting Hot Spell Duration with Random Forests, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-17884, https://doi.org/10.5194/egusphere-egu25-17884, 2025.