EGU General Assembly 2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Towards deep learning based flood forecasting for ungauged basins

Frederik Kratzert1, Daniel Klotz1, Guy Shalev2, Sella Nevo2, Günter Klambauer1, Grey Nearing3, and Sepp Hochreiter1
Frederik Kratzert et al.
  • 1Johannes Kepler University, Institute for Machine Learning, Wien, Austria (
  • 2Google LLC, Google Research, Israel
  • 3Department of Geological Sciences, University of Alabama, Tuscaloosa, AL, USA

Floods are among the most destructive natural hazards in the world. To reduce flood induced damages and casualties, streamflow forecasts should be as accurate as possible.

As of today, streamflow forecasts are usually made with either conceptual or process-based hydrological models. The problem these models usually have is that they perform best when calibrated for a specific basin, and performance degrades drastically if the models are used in places without historic streamflow measurements. To make things worse, some of the most devastating floods occur in developing and low-income countries, where historic records of streamflow measurements are scarce. Therefore, a central task for enhancing flood forecasts and helping local authorities to manage these areas is to provide high-quality streamflow forecasts in ungauged rivers. Although the IAHS dedicated an entire decade (2003-2012) to advance the problem of Prediction in Ungauged Basins the central goal remains largely a challenge.

In this talk, we will present a novel approach for tackling the problem of prediction in ungauged basins using a data-driven approach. More concretely, we show that the Long Short-Term Memory network (LSTM), which is a special type of a deep learning model, can serve as a generalizable rainfall-runoff simulation model. We will present recent results indicating that the LSTM gives on average better out-of-sample predictions (ungauged prediction) than e.g. the SAC-SMA in-sample (gauged) or the US National Water Model (Kratzert et al., 2019).

One place where these research results are already finding their way into operation is Google’s Flood Forecasting Initiative. The goal of this initiative is to provide (enhanced) flood warnings, where needed, starting with a pilot project in India. And as mentioned above, historic streamflow records in those regions are scarce, which motivates new and innovative approaches for enhanced streamflow forecasting.


Kratzert, F., Klotz, D., Herrnegger, M., Sampson, A. K., Hochreiter, S., & Nearing, G. S.: Toward improved predictions in ungauged basins: Exploiting the power of machine learning. Water Resources Research, 55., 2019.

How to cite: Kratzert, F., Klotz, D., Shalev, G., Nevo, S., Klambauer, G., Nearing, G., and Hochreiter, S.: Towards deep learning based flood forecasting for ungauged basins, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8932,, 2020

Comments on the presentation

AC: Author Comment | CC: Community Comment | Report abuse

Presentation version 1 – uploaded on 02 May 2020
  • CC1: Comment on EGU2020-8932, Dhruvesh Patel, 04 May 2020


    What is google faculty research award?

    Would you share, which city or River basin in India where your pilot project will execute?

    • AC1: Reply to CC1, Frederik Kratzert, 04 May 2020

      Re Faculty Research Award:

      Re River in India, see last slide in the presentation. Note: I personally are not working on the pilot project but are just sharing research work with Google.

  • CC2: Comment on EGU2020-8932, Maik Renner, 05 May 2020

    Hi Frederik,

    this is great and promising work - it seems like human made models did not capture the key ingredients.

    Now, I have a few practical questions for my case of doing flood  forecasting:

    How can I use your PUB-LSTM for a specific river gauge? That is what is the critical input data?

    Do I need to perform another training of the LSTM when applying it for some rivers, say in Germany, with strong anthropogenic alterations?

    Thank you!

    • AC2: Reply to CC2, Frederik Kratzert, 05 May 2020

      Hi Maik,

      let's start with the inputs:

      TLDR; (multiple) meteorological variables, catchment attributes (are needed, so that the model can learn to distingish hydrological behavior between basins), eventually additional time series + features that describe the human influence.

      Long explanation: Critical inputs are similiar to classical hydrological models, most importantly precipitation. The models we run currently use precip, temp (daily min and max), vapor pressure and solar radiation as inputs. However, these are not strictly necessary. I already trained models only with precip and temperature with similar success. Best pratices currently are however, to use as much different forcing products as you have (paper will be submitted/uploaded in the next days). That is, you can provide the network with multiple precipitation products at the same time and the network learns to combine them during training in a meaningful way. The other thing you need are catchment attributes, so that the model can learn different rainfall-runoff processes, depending on the type of catchment. Here exist no strict rule. However, using our expert knowledge we should come up with a list of features that we think could suffice to distinguish between types of basins. If you have human influenced basins, you certainly want to have some that describe the degree of human influence/regulation. With hydropower dams e.g. it might also make sense to add additional time series features, like the eletricity price, so that the model can get an idea when the dams will start to release water.

      Using the LSTM for rivers in Germany:

      TLDR; You definitely want to train it on as much local data as possible, furthermore the model was trained on meteorological data that is only available in North America.

      Long explanation: The models presented in the two linked paper both trained on the CAMELS US data set, which are mostly human undistrubed basins. The CAMELS dataset consists of forcing products that are only available in North America, as well as catchment attributes derived from US national data products. Thus, direct application on German basins does not make sense. Certainly, you want the model to be trained with the same kind of data that you later use for inference. Currently, we are compiling a global rainfall-runoff data set (which will be released under an open source license) that will allow to train global models which could then be trained on all available data worldwide and applied to any basin. But this is still work in progress. The data set makes use of streamflow data that is released under an permissive license. If you are aware of such data for German/European basins, please feel free to get in contact.

      • CC3: Reply to AC2, Maik Renner, 05 May 2020

        Hi Frederik,

        thank you for giving me the details. Its good to see that there is no magic included! The inputs are completely reasonable to me and I am inclined to give it a try for our basins in eastern Germany. I will check out the actual licence of the hydrological data.

        I am looking forward to your global analysis.