Setting the Bar: Benchmarks for Model Performances in Large-Sample Hydrology

Jan Seibert; Marc Vis; Sandra Pool

doi:https://doi.org/10.5194/egusphere-egu26-12908

[Back] [Session HS2.4.2]

EGU26-12908, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-12908

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Setting the Bar: Benchmarks for Model Performances in Large-Sample Hydrology

Jan Seibert¹, Marc Vis¹, and Sandra Pool²

Jan Seibert et al.

¹University of Zurich, Department of Geography, Zürich, Switzerland (jan.seibert@geo.uzh.ch)
²Eawag, Department Water Resources & Drinking Water, Dübendorf, Switzerland

Large-sample datasets have become available for many regions worldwide, and their availability has changed hydrological catchment modelling. Assessing model performance is an essential component of most large-sample applications. When assessing model performance, an important question is how to interpret the values of performance measures. We have previously shown that the performance of an uncalibrated bucket-type model varies significantly across regions. In humid or snow-dominated catchments, NSE values of 0.8 or higher can be reached with an uncalibrated model, which are values often considered as good. This implies that using a fixed value for a performance measure to judge model performance, as sometimes suggested in the literature, is inappropriate. Instead, one should consider that given the local hydroclimatic conditions and the available data quality, the performance we should expect from any model in a particular catchment can vary widely. At the same time, a perfect fit (value of 1) is usually impossible to achieve due to model and data errors and uncertainties. Therefore, it is helpful to compare model performances to lower and upper benchmarks.

The purpose of this study was two-fold. First, we examined how to compute lower performance bounds from randomly chosen parameter sets, including guidance for appropriate ensemble sizes, the effects of parameter ranges, and the selection of parameter sets. We also examined the relationships between lower and upper benchmarks and catchment characteristics. Secondly, we utilised these findings to compute both lower and upper benchmarks for many of the existing CAMELS datasets. By providing these values to the modelling community, we aim to facilitate the broader use of lower and upper benchmarks in large sample hydrological modelling studies. We argue that these values are valuable to the hydrological modelling community, as they provide a basis for benchmarking model performance across the various CAMELS datasets. This will allow assessment of model performance, considering what one could and should expect for a particular catchment. Such assessments are important, for instance, when one seeks to evaluate the adequacy of model structures or compare approaches for the prediction in ungauged basins.

How to cite: Seibert, J., Vis, M., and Pool, S.: Setting the Bar: Benchmarks for Model Performances in Large-Sample Hydrology, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12908, https://doi.org/10.5194/egusphere-egu26-12908, 2026.