Assessing behavior and performance of deep learning models using dynamic fingerprints of hydrological behavior

James Kirchner

doi:https://doi.org/10.5194/egusphere-egu26-3597

[Back] [Session HS3.5]

EGU26-3597, updated on 13 Mar 2026

https://doi.org/10.5194/egusphere-egu26-3597

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Assessing behavior and performance of deep learning models using dynamic fingerprints of hydrological behavior

James Kirchner

Retired from ETH Zurich, Dept. of Environmental Systems Science, Zurich, Switzerland (kirchner@ethz.ch)

Deep learning models, such as long-short-term-memory (LSTM) networks, are becoming widely adopted as the tool of choice for rainfall-runoff forecasting, reflecting their impressive performance in goodness-of-fit tests. Nonetheless it remains unclear exactly how this impressive performance is achieved, and concerns have been raised regarding the functional realism embedded in such models (Bayati et al., 2026) and their ability to extrapolate beyond the range of their training data (Baste et al., 2025). An underlying problem (with both machine learning models and conventional mechanistic models) is that they are trained and tested almost exclusively using goodness-of-fit measures relative to observed discharge time series. Such goodness-of-fit tests emphasize some aspects of model behavior but obscure others.

Thirty years ago, Kirchner et al. (1996) proposed a more diagnostic approach to model evaluation, in which the relationships of primary interest are statistically extracted from both the model behavior and the real-world data, and then compared. When carefully done, this can highlight relationships of interest between the relevant forcing factors and outcome variables. Here I illustrate this approach by comparing LSTM behavior with real-world rainfall-runoff relationships, using nonlinear and nonstationary impulse response functions from Ensemble Rainfall-Runoff Analysis (ERRA). These impulse response functions are analogous to classical unit hydrographs, but with the important distinction that they can depend nonlinearly on precipitation intensity and antecedent wetness or other time-varying attributes. They serve as dynamic fingerprints of how measured and modeled streamflows respond to precipitation, and how that response is shaped by ambient conditions and catchment characteristics. Examples of this approach, and insights derived from it, will be presented.

How to cite: Kirchner, J.: Assessing behavior and performance of deep learning models using dynamic fingerprints of hydrological behavior, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3597, https://doi.org/10.5194/egusphere-egu26-3597, 2026.