EGU23-12315
https://doi.org/10.5194/egusphere-egu23-12315
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Meta-modeling with data-driven methods in hydrology

Tobias Krueger1, Mark Somogyvari1, Ute Fehrenbach2, and Dieter Scherer2
Tobias Krueger et al.
  • 1Humboldt-Universität zu Berlin, Geography Department & IRI THESys, Berlin, Germany (tobias.krueger@hu-berlin.de)
  • 2Technische Universität Berlin, Chair of Climatology, Institute of Ecology, Berlin, Germany

Process-based models are the standard tools today when trying to understand how physical systems work. There are situations however, when system understanding is not a primary focus and it is worth substituting existing process-based models with computationally more efficient meta-models (or emulators), i.e. proxies designed for specific applications. In our research we have explored potential data-driven meta-modeling approaches for applications in hydrology, designed to solve specific research questions.

In order to find a suitable meta-modeling approach, we have experimented with a set of different data-driven methods. We have employed a multi-fidelity modeling approach, where we gradually increased the complexity of our models. In total five different approaches were investigated: linear model with ordinary least squares regression, linear model with two different Bayesian methods (Hamiltonian Monte Carlo and transdimensional Monte Carlo) and two machine learning approaches (dense artificial neural network and long short-term memory (LSTM) neural network).

For method development the project case study of the Groß Glienicker Lake was used. This is a glacial lake near Berlin, with a strong negative trend in water levels in the last decades. Supported by the observation model from the Central European Refined analysis, we had a daily, high resolution meteorological dataset (precipitation and actual evapotranspiration) and lake level observations for 16 years.

All of the used models are designed similarly: they predict lake level changes one day ahead using precipitation and evapotranspiration data from the previous 70 days. This interval was selected after an extensive parameter test with the linear model. By predicting the change in stored water, we linearize the problem, and by using a longer time interval we allow the methods to automatically compensate for any lag or memory effects inside the catchment. The different methods are evaluated by comparing the fits between the observed and the reconstructed lake levels.

As expected, increasing the model and inversion complexity improves the quality of the reconstruction. Especially the use of nonlinear models was advantageous, the artificial neural network outperformed every other method. However, in the used example these improvements were relatively small – meaning that in practice the simplest linear method was advantageous due to its computational efficiency and robustness, and ease of use and interpretation.

In this presentation we discuss the challenges of data preparation and optimal model design (especially the memory of the hydrological system), while finding the hyperparameters of the specific methods themselves was relatively straight forward. Our results suggest that problem linearization should be a preferred first step in any meta-modeling application, as it helps the training of nonlinear models as well. We also discuss data requirements, because we found that the size of our dataset was too small for the most complex LSTM method, which yielded unstable results and learned spurious background trends.

How to cite: Krueger, T., Somogyvari, M., Fehrenbach, U., and Scherer, D.: Meta-modeling with data-driven methods in hydrology, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-12315, https://doi.org/10.5194/egusphere-egu23-12315, 2023.