Comparing multi-model approaches to simulate streamflow across a large sample of catchments

Cyril Thébault; Wouter J. M. Knoben; Nans Addor; Martyn P. Clark

doi:https://doi.org/10.5194/egusphere-egu26-5934

[Back] [Session HS2.2.6]

EGU26-5934, updated on 13 Mar 2026

https://doi.org/10.5194/egusphere-egu26-5934

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Comparing multi-model approaches to simulate streamflow across a large sample of catchments

Cyril Thébault¹, Wouter J. M. Knoben¹, Nans Addor², and Martyn P. Clark¹

Cyril Thébault et al.

¹University of Calgary, Department of civil engineering, Calgary, Canada
²Fathom, Bristol, UK

The research and operational communities have developed many models to represent the complexity and diversity of hydrological processes and meet specific application needs. Previous studies have shown the limitations of a one-size-fits-all model structure (e.g. poor representation of local conditions, limited process representation, scalability issues). To address these barriers and improve model performance, multi-model approaches have been developed that select and/or combine outputs from an ensemble of models (e.g., catchment-specific selection based on performance scores, or weighting of ensemble members using methods such as Bayesian model averaging).

This study compares multi-model methods to improve streamflow simulation. Specifically, we evaluated five different approaches: a mosaic (i.e. per-catchment selection) based on performance, a mosaic based on performance-equivalence, a static combination in time and space (i.e. a fixed combination applied across all catchments), a static combination in time only (i.e. per-catchment combination) and a dynamic combination (i.e. evolving over time and space). To this end, an ensemble of 78 models was designed with the Framework for Understanding Structural Errors (FUSE) and applied to 559 catchments in the CAMELS dataset across the contiguous USA. The evaluation is based on a composite criterion to account, to some extent, for both high- and low-flow conditions. Sampling uncertainty (i.e. the variability in performance scores due to the evaluation period selected) was assessed using a bootstrap-jackknife method.

Results show that differences between multi-model approaches are small, even when complexity varies greatly (e.g., number of models per catchments, variability in space and time, computational time). Benefits compared with a one-size-fits-all model are not as large as expected, especially after considering sampling uncertainty. While perhaps surprising, this underscores the strength of the one-size-fits-all model selection used here, where model choice is guided by performance across a large ensemble of models and sample of catchments, and not arbitrarily or by convenience. These findings may also reflect limitations of common evaluation metrics, which may not fully capture the benefits of more complex approaches.

How to cite: Thébault, C., Knoben, W. J. M., Addor, N., and Clark, M. P.: Comparing multi-model approaches to simulate streamflow across a large sample of catchments, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5934, https://doi.org/10.5194/egusphere-egu26-5934, 2026.