EGU General Assembly 2020
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Machine Learning is Central to the Future of Hydrological Modeling

Grey Nearing1, Frederik Kratzert2, Craig Pelissier3,4,5, Daniel Klotz2, Jonathan Frame1, and Hoshin Gupta6
Grey Nearing et al.
  • 1University of Alabama, Department of Geological Sciences, Tuscaloosa, AL, USA (
  • 2LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Austria
  • 3NASA Goddard Space Flight Center, NASA Center for Climate Simulation, Greenbelt, MD, USA
  • 4University of Maryland Baltimore County, Department of Computer Science and Electrical Engineering, Baltimore, MD, USA
  • 5Science Systems Applications Inc., Lanham, MD, USA
  • 6University of Arizona, Department of Hydrology and Water Resources, Tucson, AZ, USA

This talk addresses aspects of three of the seven UPH themes: (i) time variability and change, (ii) space variability and scaling, and (iii) modeling methods. 

During the community contribution phase of the 23 Unsolved Problems effort, one of the suggested questions was “Does Machine Learning have a real role in hydrological modeling?” The final UPH paper claimed that “Most hydrologists would probably agree that [extrapolating to changing conditions] will require a more process-based rather than calibration-based approach as calibrated conceptual models do not usually extrapolate well.” In this talk we will present a collection of recent experiments that demonstrate how catchment models based on deep learning can account for both temporal nonstationarity and spatial information transfer (e.g., from gauged to ungauged catchments), often achieving significantly superior predictive performance compared to other state-of-the-art (process-based) modeling strategies, while also providing interpretable results. This is due to the fact that deep learning can learn, exploit, and explain catchment and hydrologic similarity in ways and with accuracies that the community has not been able to achieve using traditional methods. 

We argue that the results we have obtained motivate a path forward for hydrological modeling that centers around ‘physics-informed machine learning.’ Future model development might focus on building hybrid (AI + process-informed) models with three objectives: (i) integrating known catchment behaviors into models that are also able to learn directly from data, (ii)  building explainable deep learning models that allow us to extract scientific insights, and (iii) building hybrid models that are also able to simulate unobserved or sparsely observed variables. We argue further that while the sentiments expressed in the UPH paper about process-based modeling are common, the community currently lacks an evidence-based understanding of where and when process-based understanding is important for future predictions, and that addressing this question in a meaningful way will require true hybrids between different modeling approaches.

We will conclude by providing two fundamentally novel examples of physics-informed machine learning applied to catchment-scale and point-scale modeling: (i) conservation-constrained neural network architectures applied to rainfall-runoff processes, and (ii) integrating machine learning into existing process-based models to learn unmodeled hydrologic behaviors. We will show results from applying these strategies to the CAMELS dataset in a rainfall-runoff context, and also to FluxNet soil moisture data sets.

How to cite: Nearing, G., Kratzert, F., Pelissier, C., Klotz, D., Frame, J., and Gupta, H.: Machine Learning is Central to the Future of Hydrological Modeling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6111,, 2020.


Display file

Comments on the display

AC: Author Comment | CC: Community Comment | Report abuse

displays version 1 – uploaded on 28 Apr 2020
  • CC1: Comment on EGU2020-6111, Heidi Kreibich, 05 May 2020

    Dear Grey, interesting discussion during the chat. I wonder if ML could also be used for socio-hydrological modelling. I’m leading a community activity bringing together modellers and data providers. We want to tackle the UPH: “How can we extract information from available data on human and water systems in order to inform the building process of socio-hydrological models and conceptualisations?”. We will compile a Panta Rhei Benchmark Dataset, i.e. time series of data of water and human interaction in various catchments and regions, which shall be used to (further) develop and apply socio-hydrological models. So far we are considering stylized hydrological models based on differential equations as well as agent-based models. However, maybe also models based on ML, e.g. deep learning could be used. What do you think? Greetings Heidi Kreibich, GFZ

    • AC1: Reply to CC1, Grey Nearing, 05 May 2020

      I think there is potential. If there are patterns in the humans interactions, then ML will find it. You coupld possibly start just by asking whether there are such patterns in the data, and simply train an ML model. If the answer is 'yes' and there are patterns or regularities in the interactions, then you might be able to use something like reinforcement learning to 'learn' action strategies. I would love to try this, and look forward to seeing your data set.

      • CC2: Reply to AC1, Heidi Kreibich, 05 May 2020

        Great that you are interested in our community activity. I'll keep you informed (put you on my e-mail list, in case this is disturbing you later on, just let me know). I guess, that from ML perspective there are no specific requirements for information, data types, format etc. As I understand so far, ML can use any kind of data and test if patterns etc. can be identified.

    • CC3: Reply to CC1, Julien Malard-Adam, 07 May 2020


      I would be very interested in participating in the data set (and learning more about it) as well! I am particularly interested in developing a system of modular system dynamics components for sociohydrology (and human-environment interactions in general) that can be tested, connected to data for calibration, and reused between different parts of the world.

      • CC4: Reply to CC3, Heidi Kreibich, 07 May 2020

        Dear Julien Malard,

        Thank you very much for your interest in the Panta Rhei community activity to develop a benchmark dataset. Please send me an e-mail to and I'll keep you updated.