EGU24-11778, updated on 09 Mar 2024
https://doi.org/10.5194/egusphere-egu24-11778
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

How much data is needed for hydrological modeling? 

Bjarte Beil-Myhre1, Bernt Viggo Matheussen2, and Rajeev Shrestha3
Bjarte Beil-Myhre et al.
  • 1Å Energi - RMT - Technology and Development , Kjøita -18, 4630 Kristiansand, Norway (bjarte.beil-myhre@aenergi.no)
  • 2Å Energi - RMT - Technology and Development , Kjøita -18, 4630 Kristiansand, Norway bernt.viggo.matheussen@aenergi.no)
  • 3Å Energi - RMT - Technology and Development , Kjøita -18, 4630 Kristiansand, Norway (rajeev.shrestha@aenergi.no)

Hydrological modeling has undergone a transformative decade, primarily catalyzed by the groundbreaking data-driven approach introduced by F. Kratzert et al. (2018) utilizing LSTM networks (Hochreiter & Schmidhuber, 1997). These networks leverage extensive datasets and intricate model structures, outshining traditional hydrological models, albeit with the caveat of being computationally intensive during training. This prompts a critical inquiry into the requisite volume and complexity of data for constructing a dependable and resilient hydrological model.


In this study, we employ a hybrid model that amalgamates the strengths of classical hydrological models with the data-driven approach. These modified models are derived from the LSTM models developed by F. Kratzert and team, in conjunction with classical hydrological models such as the Statkraft Hydrology Forecasting Toolbox (SHyFT) from Statkraft and the Distributed Regression Hydrological Model (DRM) by Matheussen at Å Energi. The models were applied to sixty-five catchments in southern Norway, each characterized by diverse features and data records. Our analysis assesses the performance of these models under various scenarios of data availability, considering factors such as:


- Varying numbers of catchments selected based on size or location.
- The duration of the data records utilized for model calibration.
- Specific catchment characteristics and outputs from classical models employed as inputs 
(e.g., area, latitude, longitude, or additional variables).


Preliminary findings indicate that model inputs can be significantly stripped down without compromising model performance. With a limited set of catchment characteristics, the performance approaches that of the model with all characteristics, mitigating added uncertainty and model complexity. Additionally, increasing the length of data records enhances model performance, albeit with diminishing returns. Furthermore, our study reveals that augmenting catchments in the model does not necessarily yield a commensurate improvement in overall model performance. These insights contribute to refining our understanding of the interplay between data, model complexity, and performance in hydrological modeling.


The novelty in this research is that the hybrid models can be applied in a relatively small area, with few catchments and a limited number of climate stations and catchment characteristics compared to the CAMELS setup, used by Kratzert and still achieve improved results. 

How to cite: Beil-Myhre, B., Matheussen, B. V., and Shrestha, R.: How much data is needed for hydrological modeling? , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11778, https://doi.org/10.5194/egusphere-egu24-11778, 2024.