Interrogating the Functional Realism of Deep Learning Rainfall&ndash;Runoff Models: Diagnostic Insights and Mitigation Strategies

Ara Bayati; Ali A Ameli; Saman Razavi

doi:https://doi.org/10.5194/egusphere-egu26-13847

[Back] [Session HS3.5]

EGU26-13847, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-13847

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Interrogating the Functional Realism of Deep Learning Rainfall–Runoff Models: Diagnostic Insights and Mitigation Strategies

Ara Bayati¹, Ali A Ameli¹, and Saman Razavi^2,3

Ara Bayati et al.

¹Department of Earth, Ocean and Atmospheric Sciences, The University of British Columbia, Vancouver, British Columbia, Canada
²School of Environment and Sustainability, Global Institute for Water Security, University of Saskatchewan, Saskatoon, Canada
³School of Civil and Environmental Engineering, Faculty of Engineering, University of New South Wales (UNSW), Sydney, Australia

Deep learning rainfall–runoff models can achieve high predictive accuracy, yet still rely on correlation-driven shortcuts that are not defensible as catchment-scale mechanisms. This raises a central question: how far can correlation-driven learning be trusted to produce simulations that are hydrologically realistic, not just statistically accurate? To address this, we evaluate functional realism, defined as the extent to which a model’s internal functioning aligns with defensible mechanisms of streamflow generation. We propose a hydrology-specific Explainable AI (XAI) framework that extracts nonlinear, lag-dependent, time-varying impulse response functions (IRFs) describing how an LSTM internally maps isolated impulses in precipitation (P), temperature (T), and PET to simulated streamflow. Applied to 672 North American catchments where the LSTM demonstrated strong predictive skill, the IRFs reveal systematic functional inconsistencies masked by accuracy: in over 70% of rain-dominated catchments, short-term rises in T are associated with increased simulated streamflow and enhanced celerity even without rainfall; in snow-dominated catchments, PET is frequently treated as a proxy driver of snowmelt-related flow. We then discuss plausible origins of spurious functional learning, including seasonal confounding, heterogeneous regime mixing during training, simplicity bias (shortcut learning), and omitted drivers or missing processes. We also outline practical routes to reduce spurious learning by directly addressing these sources through input handling, regime-aware training, and targeted model adjustments.

How to cite: Bayati, A., Ameli, A. A., and Razavi, S.: Interrogating the Functional Realism of Deep Learning Rainfall–Runoff Models: Diagnostic Insights and Mitigation Strategies, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13847, https://doi.org/10.5194/egusphere-egu26-13847, 2026.