The relationship between theoretical maximum prediction limits of the LSTM and network size

Daniel Klotz; Sanika Baste; Ralf Loritz; Martin Gauch; Frederik Kratzert

doi:https://doi.org/10.5194/egusphere-egu25-10650

[Back] [Session HS3.4]

EGU25-10650, updated on 15 Mar 2025

https://doi.org/10.5194/egusphere-egu25-10650

EGU General Assembly 2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

The relationship between theoretical maximum prediction limits of the LSTM and network size

Daniel Klotz^1,2, Sanika Baste³, Ralf Loritz³, Martin Gauch⁴, and Frederik Kratzert²

Daniel Klotz et al.

¹IT:U Interdisciplinary Transformation University, Linz, Austria
²Google Research, Vienna, Austria
³Institute of Water and Environment, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
⁴Google Research, Zurich, Switzerland

Machine learning is increasingly important for rainfall–runoff modelling. In particular, the community started to widely adopt the Long Short-Term Memory (LSTM) network. One of the most important established best practices in this context is to train the LSTMs on a large number of diverse basins (Kratzert et al., 2019; 2024). Intuitively, the reason for adopting this practice is that training deep learning models on small and homogeneous data sets (e.g., data from only a single hydrological basin) leads to poor generalization behavior — especially for high-flows.

To examine this behavior, Kratzert et al. (2024) use a theoretical maximum prediction limit for LSTMs. This theoretical limit is computed as the L1 norm (i.e., the sum of the absolute values of each vector component) of the learned weight vector that relates the hidden states to the estimated streamflow. Hence, for random vectors we could simply obtain larger theoretical limits by increasing the size of the network (i.e., the number of parameters). However, since LSTMs are trained using gradient descent, this relationship is more intricate.

This contribution explores the relationship between the theoretical limit and the network size. In particular, we will look at how increasing the network size in untrained models increases the prediction limit and contrast it to the scaling behavior of trained models.

How to cite: Klotz, D., Baste, S., Loritz, R., Gauch, M., and Kratzert, F.: The relationship between theoretical maximum prediction limits of the LSTM and network size, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10650, https://doi.org/10.5194/egusphere-egu25-10650, 2025.