Machine Learning Emulator for Large-Sample Hydrologic Model Calibration across Multiple FUSE Structures

Shadi Hatami; Nicolás Vásquez; Cyril Thébault; Wouter Knoben; Darri Eythorsson; Simon Michael Papalexiou; Martyn Clark

doi:https://doi.org/10.5194/egusphere-egu26-13767

[Back] [Session HS4.10]

EGU26-13767, updated on 27 Apr 2026

https://doi.org/10.5194/egusphere-egu26-13767

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Machine Learning Emulator for Large-Sample Hydrologic Model Calibration across Multiple FUSE Structures

Shadi Hatami¹, Nicolás Vásquez¹, Cyril Thébault¹, Wouter Knoben¹, Darri Eythorsson¹, Simon Michael Papalexiou^2,1, and Martyn Clark¹

Shadi Hatami et al.

¹Department of Civil Engineering, Schulich School of Engineering, University of Calgary, Calgary, Canada
²Institute of Global Water Security, Hamburg University of Technology (TUHH), Hamburg, Germany

Large-sample hydrologic studies often require calibrating multiple model structures across numerous catchments, which can be computationally intensive with traditional optimization algorithms. Alternatively, recent advances in Machine Learning (ML) have enabled computationally frugal calibration strategies that rely on model emulators. Such approaches leverage information across sites, enabling improved calibration efficiency and parameter transferability to unseen catchments. However, exploring the parameter space using emulators is challenging because of emulator error and the need to explore high-dimensional parameter spaces. In this work, we investigate ML-based emulation and optimization strategies designed to improve parameter-space exploration, with the broader goal of supporting reproducible and computationally feasible large-sample hydrologic simulation. To this end, we use the Framework for Understanding Structural Errors (FUSE), which systematically represents alternative process formulations through multiple model configurations. Our framework is calibrated for 1,070 catchments across North America, spanning a wide range of hydroclimatic conditions. We develop Random Forest (RF) and Quantile Random Forest (QRF) emulators to approximate the relationship between model parameters, catchment attributes, and the Kling–Gupta Efficiency (KGE). While RF provides point estimates, QRF captures predictive uncertainty through conditional quantiles. These emulators are integrated into two calibration strategies: (1) a standard Genetic Algorithm (GA) that efficiently searches for high-performing parameter sets, and (2) a two-step hybrid optimizer that first performs a broad global search using Markov chain Monte Carlo sampling and then refines promising solutions using local gradient-based optimization. By more fully evaluating the parameter space and avoiding premature convergence, the two-step strategy captures a more diverse ensemble of near-optimal parameter solutions. This diversity is particularly valuable for emulator-based calibration, as it allows the emulator to be retrained iteratively on a broader range of the parameter space, improving robustness and reducing reliance on narrowly sampled regions. These improvements are expected to support more stable parameter estimates and improved hydrologic simulations across a large sample of catchments. Overall, this hybrid framework enables reproducible and computationally efficient calibration across multiple model structures and hundreds of catchments, providing a scalable pathway for integrating ML emulators into large-sample hydrologic modeling workflows.

How to cite: Hatami, S., Vásquez, N., Thébault, C., Knoben, W., Eythorsson, D., Papalexiou, S. M., and Clark, M.: Machine Learning Emulator for Large-Sample Hydrologic Model Calibration across Multiple FUSE Structures, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13767, https://doi.org/10.5194/egusphere-egu26-13767, 2026.