Global estimation of the median annual maximum flood (QMED) using explainable machine learning&nbsp;

Valeriya Filipova; David Leedal; Sam Clayton

doi:https://doi.org/10.5194/egusphere-egu26-5408

[Back] [Session HS3.6]

EGU26-5408, updated on 13 Mar 2026

https://doi.org/10.5194/egusphere-egu26-5408

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Global estimation of the median annual maximum flood (QMED) using explainable machine learning

Valeriya Filipova¹, David Leedal¹, and Sam Clayton²

Valeriya Filipova et al.

¹JBA Risk Management, Skipton, United Kingdom of Great Britain – England, Scotland, Wales (valeriya.filipova@jbarisk.com)
²JBA Consulting, Skipton, United Kingdom of Great Britain – England, Scotland, Wales

Reliable estimation of the median annual maximum flood (QMED) is central to flood risk assessment and the design of hydraulic infrastructure, particularly in ungauged basins. Traditional index-flood approaches typically delineate homogeneous regions and estimate QMED using linear regression on a small set of catchment descriptors. However, these assumptions are often violated in practice, leading to substantial prediction uncertainty.

Here, we explore the potential of explainable machine-learning models to estimate QMED at large scale. Using data from approximately 8,500 catchments and more than 60 climatic, physiographic, and geomorphological descriptors, we train non-linear models (XGBoost and TabNet) to predict QMED for ungauged basins. To promote physically plausible behaviour, model training incorporates constraints on specific discharge alongside standard performance metrics. A key feature of the approach is the extensive use of DEM-derived terrain and river-network descriptors, which can be computed consistently from widely available global elevation datasets.

Model interpretability is addressed using global and local explainability techniques, enabling identification of the dominant controls on QMED and how their importance varies spatially. Across independent test data, the models show strong predictive skill (R² > 0.8, median absolute percentage error ~30%). Notably, in many regions models trained on large, globally diverse datasets outperform those trained solely on local data, even where substantial local records are available.

These results indicate that combining globally consistent physiographic information with interpretable, non-linear machine-learning models offers a promising alternative to traditional regional regression methods for QMED estimation, with potential benefits for flood risk assessment in data-sparse regions.

How to cite: Filipova, V., Leedal, D., and Clayton, S.: Global estimation of the median annual maximum flood (QMED) using explainable machine learning , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5408, https://doi.org/10.5194/egusphere-egu26-5408, 2026.