EGU25-6432, updated on 25 Mar 2025
https://doi.org/10.5194/egusphere-egu25-6432
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Tuesday, 29 Apr, 14:05–14:25 (CEST)
 
Room 3.29/30
Large-Sample Machine Learning Models for Estimation, Attribution, and Projection of Hydrometeorological Extremes
Louise Slater1, Michel Wortmann1,2, Simon Moulds3, Yinxue Liu1,4, Boen Zhang1, Laurence Hawker5, Liangkun Deng1,6, and Emma Ford1,7
Louise Slater et al.
  • 1School of Geography and the Environment, University of Oxford, Oxford, UK (louise.slater@ouce.ox.ac.uk)
  • 2European Centre for Medium-Range Weather Forecasts, Shinfield Park, Reading, UK
  • 3School of Geosciences, University of Edinburgh, Edinburgh, UK
  • 4Geography and Environment, Loughborough University, Loughborough, UK
  • 5School of Geographical Sciences, University of Bristol, Bristol, UK
  • 6State Key Laboratory of Water Resources Engineering and Management, Wuhan University, Wuhan, China
  • 7Atmospheric, Oceanic and Planetary Physics, University of Oxford, Oxford, UK

The estimation, attribution or projection of hydro-meteorological extremes in individual locations is constrained by the limited number of observations of extreme events. Recent advances in large-sample machine learning (ML) models, however, have demonstrated significant potential to mitigate the impact of data scarcity on the quantification of hydrological risks. These models integrate hundreds to thousands of time-series records alongside local descriptors of climate and catchment characteristics, enabling them to learn relationships across diverse environments and provide accurate estimations of hydro-meteorological extremes. This presentation will highlight our recent advancements and challenges in developing large-sample ML models for estimating, attributing, and projecting hydro-meteorological extremes.

At the core of our ML models is the GRIT river network, a new global bifurcating network which includes multi-threaded rivers, canals, and deltas. Unlike conventional single-threaded global river networks, GRIT incorporates bifurcations derived from the 30m Landsat-based river mask from GRWL and elevation-based streams from the FABDEM digital terrain model. This realistic depiction is critical, as 98% of floods identified in the Global Flood Database occur within 10 km of a river bifurcation. Individual river reaches in GRIT are assigned a broad range of static and time-varying variables describing the local meteorology, climate, geology, soils, geomorphology, Earth observation, terrestrial water storage, land cover time series, socio-economic data, and a novel archive of historical river discharge records from approximately 60,000 gauges.

This novel dataset enables us to tackle three key challenges: (1) Flood estimation: We estimate flood hazards globally, such as bankfull river discharge, the mean annual flood, and return periods, and assess the ability of the models to produce spatially-consistent hazard estimates. By leveraging an expanded training envelope, the ML models generate reliable estimates in data-sparse regions. (2) Flood attribution: Leveraging a range of explainability methods such as model probes, sensitivity testing, SHAP, ALE, PDP, and gradient-based methods, we investigate flood-generating mechanisms across diverse catchment types. Explainable AI (XAI) tools enable us to interrogate the models to enhance our understanding of the physical and anthropogenic drivers of flooding. (3) Flood prediction and projection: We assess the utility of hybrid large-sample ML models trained directly on subseasonal to seasonal forecasts or Earth system model (ESM) outputs for future flood projections. We show how large-sample models can implicitly correct spatio-temporal biases in forecasts or ESM outputs and deliver reliable predictions, bypassing traditional modelling steps such as downscaling and bias-correction.

Finally, we discuss key challenges in large-sample modelling, such as systematic biases in training data, inconsistencies in XAI results, causality, and the relative strengths and weaknesses of simple ML models versus deep learning. These challenges underscore the need for continued innovation in large-sample model design and application. By integrating diverse datasets and advanced ML techniques, large-sample models present transformative opportunities for flood estimation, attribution, and projection, enabling informed decision-making for management of hydro-meteorological extremes.

 

How to cite: Slater, L., Wortmann, M., Moulds, S., Liu, Y., Zhang, B., Hawker, L., Deng, L., and Ford, E.: Large-Sample Machine Learning Models for Estimation, Attribution, and Projection of Hydrometeorological Extremes, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-6432, https://doi.org/10.5194/egusphere-egu25-6432, 2025.