EGU General Assembly 2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Learning from one’s errors: A data-driven approach for mimicking an ensemble of hydrological model residuals

John M. Quilty1 and Anna E. Sikorska-Senoner2
John M. Quilty and Anna E. Sikorska-Senoner
  • 1Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Canada (
  • 2Department of Geography, University of Zurich, Zürich, Switzerland (

Despite significant efforts to improve the calibration of hydrological models, when applied to real-world case studies, model errors (residuals) remain. These residuals impair flow estimates and can lead to unreliable design, management, and operation of water resources systems. Since these residuals are auto-correlated, they should be treated with appropriate methods that do not require limiting assumptions (e.g., that the residuals follow a Gaussian distribution).

This study introduces a novel data-driven framework to account for residuals of hydrological models. Our framework relies on a conceptual-data-driven approach (CDDA) that integrates two models, i.e., a hydrological model (HM) with a data-driven (i.e., machine learning) model (DDM), to simulate an ensemble of residuals from the HM. In the first part of the CDDA, a HM is used to generate an ensemble of streamflow simulations for different parameter sets. Afterwards, residuals associated with each simulation are computed and a DDM developed to predict the residuals. Finally, the original streamflow simulations are coupled with the DDM predictions to produce the CDDA output, an improved ensemble of streamflow simulations. The proposed CDDA is a useful approach since it respects hydrological processes via the HM and it profits from the DDM’s ability to simulate the complex (nonlinear) relationship between residuals and input variables.

To explore the utility of CDDA, we focus principally on identifying the best DDM and input variables to mimic HM residuals. For this purpose, we have explored eight different DDM variants and multiple input variables (observed precipitation, air temperature, and streamflow) at different lag times prior to the simulation day. Based on a case study involving three Swiss catchments, the proposed CDDA framework is shown to be very promising at improving ensemble streamflow simulations, reducing the mean continuous ranked probability score by 16-29 % when compared to the standalone HM. It was found that eXtreme Gradient Boosting (XGB) and Random Forests (RF), each using 29 input variables, were the strongest predictors of the HM residuals. However, similar performance could be achieved by selecting only the six most important (of the original 29) input variables and re-training the XGB and RF models.

Additional experimentation shows that by converting CDDA to a stochastic framework (i.e., to account for important uncertainty sources), significant gains in model performance can be achieved.

How to cite: Quilty, J. M. and Sikorska-Senoner, A. E.: Learning from one’s errors: A data-driven approach for mimicking an ensemble of hydrological model residuals, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13244,, 2021.


Display file

Comments on the display

to access the discussion