HS3.3 | Explainable and hybrid machine learning in hydrology and Earth system sciences
Explainable and hybrid machine learning in hydrology and Earth system sciences
Co-organized by ESSI1/NP1
Convener: Shijie JiangECSECS | Co-conveners: Ralf LoritzECSECS, Lu LiECSECS, Basil KraftECSECS, Dapeng FengECSECS
Orals
| Wed, 30 Apr, 08:30–12:15 (CEST)
 
Room 3.16/17
Posters on site
| Attendance Tue, 29 Apr, 16:15–18:00 (CEST) | Display Tue, 29 Apr, 14:00–18:00
 
Hall A
Posters virtual
| Attendance Tue, 29 Apr, 14:00–15:45 (CEST) | Display Tue, 29 Apr, 14:00–18:00
 
vPoster spot A
Orals |
Wed, 08:30
Tue, 16:15
Tue, 14:00
The complexity of hydrological and Earth systems poses significant challenges to their prediction and understanding capabilities. The advent of machine learning (ML) provides powerful tools for modeling these complex systems. However, realizing their full potential in this field is not just about algorithms and data, but requires a cooperative interaction between domain knowledge and data-driven power. This session aims to explore the frontier of this convergence and how it facilitates a deeper process understanding of various aspects of hydrological processes and their interactions with the atmosphere and biosphere across spatial and temporal scales.

We invite researchers working in the fields of explainable AI, physics-informed ML, hybrid Earth system modeling (ESM), and AI for causal and equation discovery in hydrology and Earth system sciences to share their methodologies, findings, and insights. Submissions are welcome on topics including, but not limited to:

- Explainability and transparency in ML/AI modeling of hydrological and Earth systems;
- Process and knowledge integration in ML/AI models;
- Data assimilation and hybrid ESM approaches;
- Causal learning and inference in ML models;
- Data-driven equation discovery in hydrological and Earth systems;
- Data-driven process understanding in hydrological and Earth systems;
- Challenges, limitations, and solutions related to hybrid models and XAI.

Orals: Wed, 30 Apr | Room 3.16/17

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Shijie Jiang, Dapeng Feng, Ralf Loritz
08:30–08:35
08:35–08:45
|
EGU25-10256
|
ECS
|
On-site presentation
Yuanhao Xu and Kairong Lin

The formation of floods, as a complex physical process, exhibits dynamic changes in its driving factors over time and space under climate change. Due to the black-box nature of deep learning, its use alone does not enhance understanding of hydrological processes. The challenge lies in employing deep learning to uncover new knowledge on flood formation mechanism. This study proposes an interpretable framework for deep learning flood modeling that employs interpretability techniques to elucidate the inner workings of a peak-sensitive Informer, revealing the dynamic response of floods to driving factors in 482 watersheds across the United States. Accurate simulation is a prerequisite for interpretability techniques to provide reliable information. The study reveals that comparing the Informer with Transformer and LSTM, the former showed superior performance in peak flood simulation (NSE over 0.6 in 70% of watersheds). By interpreting Informer’s decision-making process, three primary flood-inducing patterns were identified: precipitation, excess soil water, and snowmelt. The controlling effect of dominant factors is regional, and their impact on floods in time steps shows significant differences, challenging the traditional understanding that variables closer to the timing of flood event occurrence have a greater impact. Over 40% of watersheds exhibited shifts in dominant driving factors between 1981-2020, with precipitation-dominated watersheds undergoing more significant changes, corroborating climate change responses. Additionally, the study unveils the interplay and dynamic shifts among variables. These findings suggest that interpretable deep learning, through reverse deduction, transforms data-driven models from merely fitting nonlinear relationships to effective tools for enhancing understanding of hydrological characteristics.

How to cite: Xu, Y. and Lin, K.: Uncovering the Dynamic Drivers of Floods through Interpretable Deep Learning, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10256, https://doi.org/10.5194/egusphere-egu25-10256, 2025.

08:45–08:55
|
EGU25-15651
|
ECS
|
On-site presentation
Michael Engel, Stefan Kunz, Maria Wetzel, and Marco Körner

Groundwater is a critical resource for drinking water supply, agriculture, and ecosystems in general. In regions facing water scarcity, such as Brandenburg (Germany), effective groundwater management is essential. This requires accurate assessments of groundwater dynamics, which data-driven models can deliver through efficient and reliable groundwater level (GWL) predictions. To effectively develop and apply data-driven models for groundwater level prediction, a deeper understanding of which and how the input features influence the groundwater level prediction is crucial.

Our primary objective is to assess the impact of the input features of a Deep learning (DL) model that predicts GWLs using feature attribution methods. Specifically, the influence of climatic features as well as different land use patterns is examined. This study employs a global DL model based on the Long Short-Term Memory (LSTM) architecture to predict seasonal GWLs for 16 weeks ahead. We utilize a comprehensive set of features, including dynamic features such as climatic variables (e.g., temperature, precipitation, relative humidity) and static features such as Corine land cover. By incorporating these, we aim to capture the complex interactions between climate, landuse and groundwater levels.

For the feature attribution itself, we apply the Shapley value sampling method. It analyses the effect of an alternation of an input feature to the respective chosen objective. The choice of that function is essential for the obtained results. We alternate the corresponding objective function in three distinct ways: first, by using the total change of the predicted GWL for the whole period of interest; second, per prediction horizon, i.e. per predicted week of the 16 week prediction; and third, through a decomposition into partial scale-respective signals of the period of interest using the discrete wavelet transform. Besides understanding which input features are most important for the predictive performance of the LSTM model, the results enable us to identify further aspects of the dynamics learned by the model. For example, if and when the model switches from extrapolation to prediction, and at which temporal scales different factors play a role; e.g. if forest vegetation is more important for seasonal or weekly effects on groundwater levels. This multi-faceted approach allows us to gain a deeper understanding of the factors influencing GWLs and their temporal dynamics, both for static and dynamic input features. Ultimately, feature attribution methods can enhance the awareness for reasonable land-use, hence, groundwater management and lead to better predictive models.

How to cite: Engel, M., Kunz, S., Wetzel, M., and Körner, M.: Multitemporal and Multiscale Feature Attribution Methods to Understand the Impact of Climatic and Land Use Features on the Prediction of Groundwater Levels, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-15651, https://doi.org/10.5194/egusphere-egu25-15651, 2025.

08:55–09:05
|
EGU25-482
|
ECS
|
On-site presentation
Qiuyang Chen, Simon Mudd, and Simon Moulds

River discharge prediction is critical for water resource management, yet equifinality—where multiple model configurations achieve similar accuracy—complicates process understanding. We explored this phenomenon using Long Short-Term Memory (LSTM) models trained on UK river basins, incorporating geomorphic descriptors derived from Digital Terrain Models and other environmental features from the CAMELS-GB dataset, including land cover, soil, and climate variables. Explainable AI techniques revealed that the models rely on different, yet equally effective, combinations of correlated features to achieve comparable performance. This variability underscores the complexity of hydrological systems and highlights the importance of integrating explainability and domain knowledge in machine learning to enhance model interpretability and robustness.

How to cite: Chen, Q., Mudd, S., and Moulds, S.: Equifinality in River Discharge Prediction Revealed Through Explainable AI , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-482, https://doi.org/10.5194/egusphere-egu25-482, 2025.

09:05–09:15
|
EGU25-16971
|
ECS
|
On-site presentation
Sanika Baste, Daniel Klotz, Eduardo Espinoza, Andras Bardossy, and Ralf Loritz

Long Short-Term Memory (LSTM) networks have shown strong performance in rainfall–runoff modelling, often surpassing conventional hydrological models in benchmark studies. However, recent studies raise questions about their ability to extrapolate, particularly under extreme conditions that exceed the range of their training data. This study examines the performance of a stand-alone LSTM trained on 196 catchments in Switzerland when subjected to synthetic design precipitation events of increasing intensity and varying duration. The model’s response is compared to that of a hybrid model and evaluated against hydrological process understanding. Our study reiterates that the stand-alone LSTM is characterised by a theoretical prediction limit, and we show that this limit is below the range of the data the model was trained on. We show that saturation of the LSTM cell states alone does not fully account for this characteristic behaviour, as the LSTM does not reach full saturation, particularly for the 1-day events. Instead, its gating mechanisms prevent new information about the current extreme precipitation from being incorporated into the cell states. Adjusting the LSTM architecture, for instance, by increasing the number of hidden states, and/or using a larger, more diverse training dataset can help mitigate the problem. However, these adjustments do not guarantee improved extrapolation performance, and the LSTM continues to predict values below the range of the training data or show hydrologically unfeasible runoff responses during the 1-day design experiments. Despite these shortcomings, our findings highlight the inherent potential of stand-alone LSTMs to capture complex hydro-meteorological relationships. We argue that more robust training strategies and model configurations could address the observed limitations, ensuring the promise of stand-alone LSTMs for rainfall–runoff modelling.

How to cite: Baste, S., Klotz, D., Espinoza, E., Bardossy, A., and Loritz, R.: The Extrapolation Dilemma in Hydrology: Unveiling the extrapolation properties of data-driven models, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16971, https://doi.org/10.5194/egusphere-egu25-16971, 2025.

09:15–09:25
09:25–09:35
|
EGU25-12928
|
ECS
|
Highlight
|
On-site presentation
Fangzheng Ruan, Oscar M. Baez-Villanueva, Olivier Bonte, Akash Koppa, Wantong Li, Gustau Camps Valls, Yuting Yang, and Diego G. Miralles

Terrestrial evaporation (E) is a critical component of the water cycle, returning nearly 60% of continental precipitation to the atmosphere and dissipating approximately 50% of surface net radiation. A prevalent approach for estimating E involves computing a theoretical maximum, known as potential evaporation (Ep), and scaling it based on a multiplicative stress factor, often referred to as “evaporative stress” (S) or “transpiration stress” (St) when specifically applied to plant transpiration. Like stomatal or surface conductance, St is governed by a complex nonlinear interplay of environmental drivers such as soil moisture, air temperature, radiation, and atmospheric vapor pressure deficit. This complexity is not yet fully understood, which further hampers its accurate physical modelling and limits our ability to comprehend transpiration’s sensitivity to the changing environment.

The fourth generation of the Global Land Evaporation Amsterdam Model (GLEAM4) has yielded a global dataset of transpiration by integrating multi-source remote sensing data following a hybrid approach, in which Ep is computed based on a process-based model and St is calculated by employing deep neural networks. These neural networks are trained on global eddy covariance and sap flow measurements for both tall and short vegetation, and are informed by a set of environmental controls or biotic factors. These factors include soil moisture, vapor pressure deficit, atmospheric CO2 concentration, wind velocity, air temperature, downwelling shortwave radiation, LAI, and vegetation optical depth. Beyond the predictive capabilities of these deep neural networks, the relationships between environmental controls and St within these neural networks remain under exploration, leaving uncertainty as to whether GLEAM4 accurately represents real-world processes. To explore the relationships, we employ the SHapley Additive exPlanation (SHAP) method, which quantifies the marginal contributions of predictors to model predictions, offering insights into the relative importance of environmental drivers in determining St.

Our findings highlight dominant St drivers across various climatic regimes and ecosystems, revealing their contributions' temporal evolution. Additionally, we investigate how St responds to shifts in environmental conditions, including climate and vegetation changes, water stress, atmospheric aridity, and rising CO2 levels. Our study enhances global understanding of transpiration dynamics and provides critical insights into the impacts of diverse hydroclimatic drivers, thereby supporting broader applications within the hydrology and climate communities.

How to cite: Ruan, F., M. Baez-Villanueva, O., Bonte, O., Koppa, A., Li, W., Camps Valls, G., Yang, Y., and G. Miralles, D.: Global Vegetation Stress Drivers based on Hybrid Modelling and Explainable AI, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12928, https://doi.org/10.5194/egusphere-egu25-12928, 2025.

09:35–09:45
|
EGU25-4428
|
ECS
|
On-site presentation
Qingsong Xu and Xiao Xiang Zhu

Effective flood forecasting is critical for informed decision-making and timely emergency response. Traditional physical models, which rely on fixed-resolution spatial grids and input parameters, often incur substantial computational costs, limiting their capacity to accurately predict flood peaks and provide prompt hazard warnings.  This paper introduces methods to ensure physical consistency in machine learning models, aiming to develop a fast, stable, accurate, cross-regional, and downscaled neural flood forecasting foundation model. Specifically, we present a Physics-embedded Neural Network, which integrates the momentum and mass conservations of flood dynamics into a neural network. Additionally, we combine this Physics-embedded Neural Network with a diffusion-based generative model, enhancing physical process consistency for long-term, large-scale flood forecasting. We also briefly introduce other models that integrate physics and machine learning, such as the FloodCast model by incorporating hydrodynamic equations into its loss function to maintain physical consistency, and the UrbanFloodCast model by learning physical consistency from urban flood dynamic data. The performance of these models will be analyzed using our proposed FloodCastBench dataset, a comprehensive collection of low-fidelity and high-fidelity flood forecasting dataset and benchmark. Results from the dataset demonstrate that incorporating physical consistency significantly enhances flood forecasting accuracy, demystifies the black-box nature of machine learning frameworks, and increases confidence in addressing dynamical systems. Finally, we propose a Spatiotemporal Foundation Model capable of forecasting floods across a variety of scales and regions.

How to cite: Xu, Q. and Zhu, X. X.: Towards Physics-consistent Foundation Models for Flood Forecasting, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-4428, https://doi.org/10.5194/egusphere-egu25-4428, 2025.

09:45–10:05
|
EGU25-9176
|
ECS
|
solicited
|
On-site presentation
Mouad Ettalbi, Pierre-André Garambois, Ngo-Nghi-Truyen Huynh, Emmanuel Ferreira, and Nicolas Baghdadi

The integration of remote sensing observations into hydrological modeling frameworks presents a significant opportunity for improving spatial and temporal predictive capabilities across continental domains. This research introduces a novel hybrid distributed hydrological model that addresses key challenges in computational efficiency, by using a GPU-enabled computational infrastructure, and in predictive accuracy by assimilating multi-source remote sensing datasets, specifically satellite-based soil moisture and evapotranspiration, at a high spatial resolution (1km×1km) and temporal scale (hourly). The model addresses critical challenges in regional hydrological forecasting by leveraging advanced data assimilation techniques and machine learning methodologies.

The proposed hybrid modeling framework synthesizes physically-based distributed hydrologic modeling principles with data-driven machine learning approaches, facilitating a more comprehensive representation of land surface hydrological processes. A key innovation is the GPU-enabled cell-to-cell routing algorithm, which enables fast and efficient computational processing of complex hydrological connectivity and water movement across large spatial domains. By integrating remote sensing observations, the methodology enables enhanced initial condition specification and improved parameter estimation, particularly in regions characterized by sparse ground-based measurement networks.

Preliminary analytical results demonstrate significant improvements in model performance, particularly in capturing spatial and temporal variability of hydrological states and fluxes. The approach substantively advances current methodological capabilities in hydrological forecasting, offering a promising framework for developping enriched tensorial numerical solvers, addressing complex hydroclimatic prediction challenges in data-limited environments.

How to cite: Ettalbi, M., Garambois, P.-A., Huynh, N.-N.-T., Ferreira, E., and Baghdadi, N.: GPU-Enabled Cell-to-Cell Routing in a High Resolution Hybrid Distributed Hydrological Model with Multi-Source Remote Sensing Data Assimilation: A Continental-Scale Computational Approach , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9176, https://doi.org/10.5194/egusphere-egu25-9176, 2025.

Coffee break
Chairpersons: Basil Kraft, Lu Li, Ralf Loritz
10:45–10:55
|
EGU25-7382
|
On-site presentation
Derek Karssenberg

Neural networks are efficient and effective in predicting system states in hydrology. However, most current approaches lack hydrological flow partitioning, do not allow for training on measurements of multiple variables, or lack capability to tightly integrate physically-based components. To address these shortcomings I propose and evaluate an approach referred to below as Dynamical System Neural Network (DSNN). DSNN is a feedforward neural network with an architecture that resembles the organisation in components of the real-world system it represents. In hydrology, the DSNN represents each water flow (e.g. seepage, snow melt) by a collection of input, hidden, and output neural layers, where each input is the state of a hydrological storage (e.g. groundwater storage influencing seepage) or other variable (e.g. air temperature influencing snow melt). These components are interconnected to form a single neural network of the complete dynamical system considered, where all storages and flows are explicitly quantified. If physical understanding of a flow and its parameterization is available, a known formulation can be used as a replacement of a neural network component. The DSNN is applied forward in time, backpropagating gradients over all timesteps. It can be run in spatially lumped or semi-distributed mode. To demonstrate the approach, a DSNN is presented of the Austrian Dorfertal (Kals) Alpine catchment containing snow and subsurface water storages and associated flows including streamflow. The DSNN is trained, validated, and tested on daily streamflow over ~40 years. To explore the capability of the DSNN in estimating the magnitude and dynamics of internal system storages (snow water equivalent, subsurface water storage) and flows (evapotranspiration, sublimation, snowmelt, seepage), the DSNN is first trained and tested with streamflow data generated by a conceptual model. The DSNN turns out to be capable of reproducing - with a satisfactory level of precision - the system states and fluxes calculated by the conceptual model, with decreasing performance when measurement error is added to the artificially generated streamflow data before training. To explore its predictive performance, the DSNN is applied on measured streamflow data for the Dorfertal, comparing multiple DSNN setups that represent all flows as neural network components or only a subset of flows where remaining flows are represented with a standard conceptual model (e.g. linear reservoir). Preliminary results indicate that in predictive performance, in most setups, the DSNN outperforms a standard conceptual model trained on the same streamflow data, with NSE values for testing of 0.74 and 0.71, respectively. This preliminary result indicates DSNN to be a promising approach for blending process-based and neural network based modelling as well as for training (i.e. calibration) of neural network models on measurements of multiple hydrological variables as these are all explicitly represented by the DSNN and can thus be incorporated in the loss function (e.g. streamflow, snow depth, groundwater, evapotranspiration).

How to cite: Karssenberg, D.: Dynamical system neural network for hydrological modelling, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-7382, https://doi.org/10.5194/egusphere-egu25-7382, 2025.

10:55–11:05
|
EGU25-6278
|
On-site presentation
Anneli Guthke, Philipp Luca Reiser, and Paul Bürkner

Physics-based hydrological modelling provides great opportunities for risk assessment and water resources management. However, diagnostic model evaluation and quantitative uncertainty assessment remain a challenge: (1) Model choices, boundary conditions, and prior assumptions about input, parameter or data uncertainty might be hard to formulate or justify; (2) rigorous propagation of uncertainties struggles when the analysed model structure is not “true”, and (3) a full propagation of uncertainties is often computationally prohibitive for complex models.

Alternative approaches promote the extraction of information directly from data, thereby avoiding overly strict physics-based constraints and the pitfalls of uncertainty quantification. Challenges of these data-driven approaches include the lack (or difficulty of) explainability, transparency, and transferability to unseen scenarios.

To explore the frontier of where those two perspectives (should) converge, we investigate the potential of surrogate models (computationally cheaper, data-driven representations of complex models) as a binding link with several potential benefits: (1) they alleviate the computational burden and thereby allow for a fully Bayesian uncertainty analysis; (2) they are flexible enough to overcome structural deficits of the original complex model, thereby enabling a better predictive performance, and (3) being data-driven, we can elegantly fuse the information from available data into their training process.

Methodologically, we propose a weighted data-integrated training of surrogates via two competing approaches that differ technically, but also philosophically, and reveal complementing insights about the strengths and weaknesses of the physics-based model and about the additional information in the available data, thereby facilitating deeper system understanding and improved (hybrid) modelling. We demonstrate the proposed workflow on didactic examples and a real-world case study. We expect this approach to be generally useful for modelling dynamic systems, as it contributes to more realistic uncertainty assessment and opens up ways for model development.  

How to cite: Guthke, A., Reiser, P. L., and Bürkner, P.: Training Surrogates with Knowledge and Data: A Bayesian Hybrid Modelling Strategy, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-6278, https://doi.org/10.5194/egusphere-egu25-6278, 2025.

11:05–11:15
|
EGU25-15436
|
ECS
|
On-site presentation
Bjarte Beil-Myhre, Rajeev Shrestha, and Bernt Viggo Matheussen

The field of hydrology has undergone significant transformation over the past decade, driven by advancements in machine learning and data-driven techniques. A key breakthrough came from the work of Kratzert et al. (2018), who demonstrated that purely data-driven LSTM models could outperform traditional hydrological models in over 600 catchments across North America. However, while these models significantly improve predictive performance, they often sacrifice interpretability and explainability.

To address this trade-off, researchers have explored new approaches that merge physical principles with data-driven methods. One promising innovation is the concept of differentiable modeling, introduced by Chen et al. in 2022. This approach transforms physical models into differentiable functions, allowing neural networks to represent and learn model parameters. By doing so, differentiable modeling enhances flexibility while maintaining a foundation in physical principles.

This research presents a novel differentiable hydrological model called the Differentiable Distributed Regression Model (dDRM). The dDRM builds on the principles of differentiable modeling with the structure of a conceptually lumped model using a simplified representation of physics ("smooth" HBV model). Inspired by the simplicity of the LSTM model, which aggregates data at the catchment level rather than relying on a grid-based representation, we introduce four equally sized elevation zones instead of grid cells in the dDRM. These zones inherently reflect differences in hydrological processes, such as precipitation, temperature, and snowmelt dynamics, enabling the model to account for spatial heterogeneity while maintaining computational efficiency.

By leveraging the principles of differentiable modeling, the dDRM achieves a balance between explainability and predictive performance. To evaluate model performance, we tested the dDRM across sixty-three catchments in southern Norway, in a gauged setting. Only precipitation and temperature were used as input data. For benchmarking purposes, we also trained an LSTM model to the same catchments. 

Our results demonstrate that the dDRM outperforms the fine-tuned LSTM model in both daily predictions and cumulative runoff volumes. These findings underscore the potential of differentiable hydrological models to bridge the gap between performance and interpretability. By combining physical principles with data-driven techniques, the dDRM provides a pathway toward more effective and understandable forecasting tools in hydrology.

How to cite: Beil-Myhre, B., Shrestha, R., and Matheussen, B. V.: The Differentiable Distributed Regression Model (dDRM) Balancing Explainability and Predictive Performance, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-15436, https://doi.org/10.5194/egusphere-egu25-15436, 2025.

11:15–11:25
11:25–11:35
|
EGU25-7770
|
On-site presentation
Yi Zheng, Chao Wang, and Shijie Jiang
Accurately simulating large-scale water dynamics is important
for managing water resources, addressing climate change impacts, and
understanding hydrological variability. Despite advances in hydrological
modeling, simulating water fluxes and states at global or regional
scales remains challenging due to the complexity of distributed
processes and limited understanding of key components. Encoding physical
knowledge in deep neural networks (NNs) for differentiable modeling
offers a promising solution but has yet to be fully realized for
distributed hydrological models, especially for processes such as river
routing.
This study presents a novel differentiable modeling framework that
bridges physical and data-driven approaches for distributed hydrological
modeling. The framework encodes a large-scale hydrological model (i.e.,
HydroPy) as a neural network, incorporates an additional NN to map
spatially distributed parameters from local climate and land attributes,
and employs NN-based modules to represent poorly understood processes.
Multi-source observations are used to constrain the system in an
end-to-end manner, with the Amazon Basin as a case study to demonstrate
the framework’s applicability and effectiveness.
Results show that the developed model improves simulation accuracy by
30-40% compared to the original hydrological model. Replacing the
Penman-Monteith formulation with NN produces more realistic potential
evapotranspiration estimates. SHAP analysis of the NN parameterization
further reveals how climate and land attributes regulate the spatial
variability of key parameters. Overall, by integrating physical realism
with the flexibility of machine learning, this framework addresses
critical limitations of traditional hydrological models. It provides a
scalable, interpretable approach to advance large-scale hydrological
modeling and address pressing water and climate challenges.

How to cite: Zheng, Y., Wang, C., and Jiang, S.: Advancing distributed hydrological modeling with hybrid machinelearning, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-7770, https://doi.org/10.5194/egusphere-egu25-7770, 2025.

11:35–11:45
|
EGU25-2740
|
ECS
|
On-site presentation
Ngo Nghi Truyen Huynh, Pierre-André Garambois, Benjamin Renard, and Jérôme Monnier
Machine learning (ML) methods have been utilized in hydrology for decades. Recently, hybrid approaches that combine data-driven techniques with process-based models have gained attention, highlighting the complementary strengths of ML and physical models. However, the explicability and adaptability of such hybrid models remain open questions. This work introduces a general framework for incorporating neural networks (NNs) and ML techniques into a regionalizable, spatially distributed hydrological model. As a case study, a simple NN is employed to correct internal fluxes within a conceptual GR hydrological model that allows analytical integration. The corresponding hybrid ordinary differential equation set is integrated with an implicit numerical scheme solved by the Newton-Raphson method. Implementation in Fortran-based code supports differentiability, enabling the computation of the cost gradient through a combination of an adjoint model and analytical NN gradients. Results over a large catchment sample show promising improvements in model accuracy and provide insights into hydrological behaviors through interpretable NN outputs. These findings demonstrate the framework's potential to advance hybrid hydrological modeling by enhancing explicability and adaptability. Additionally, the proposed framework offers flexibility for integration into other modeling chains and applications across diverse geophysical models.

How to cite: Huynh, N. N. T., Garambois, P.-A., Renard, B., and Monnier, J.: A General Framework for Integrating Neural Networks into Numerical Resolution Methods for Spatially Distributed Hydrological Models, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-2740, https://doi.org/10.5194/egusphere-egu25-2740, 2025.

11:45–12:05
|
EGU25-20878
|
ECS
|
solicited
|
Virtual presentation
Yalan Song, Chaopeng Shen, Haoyu Ji, and Farshid Rahmani

Continental and global water models have long been trapped in slow growth and inadequate predictive power, as they are not able to effectively assimilate information from big data. While Artificial Intelligence (AI) models greatly improve performance, purely data-driven approaches do not provide strong enough interpretability and generalization. One promising avenue is “differentiable” modeling that seamlessly connects neural networks with physical modules and trains them together to deliver real-world benefits in operational systems. Differentiable modeling (DM) can efficiently learn from big data to reach state-of-the-art accuracy while preserving interpretability and physical constraints, promising superior generalization ability, predictions of untrained intermediate variables, and the potential for knowledge discovery. Here we demonstrate the practical relevance of a high-resolution, multiscale water model for operational continental-scale and global-scale water resources assessment. (https://bit.ly/3NnqDNB). Not only does it achieve significant improvements in streamflow simulation compared to the established national- and global water models, but it also produces much more reliable depictions of interannual changes in large river streamflow, freshwater inputs to estuaries, and groundwater recharge. 

How to cite: Song, Y., Shen, C., Ji, H., and Rahmani, F.: High-Resolution Differentiable Models for Operational National and Global Water Modeling and Assessment, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-20878, https://doi.org/10.5194/egusphere-egu25-20878, 2025.

12:05–12:15

Posters on site: Tue, 29 Apr, 16:15–18:00 | Hall A

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Tue, 29 Apr, 14:00–18:00
Chairpersons: Dapeng Feng, Lu Li, Basil Kraft
A.25
|
EGU25-292
|
ECS
Hossein Abbasizadeh, Petr Maca, and Martin Hanel

While precipitation is the primary driver of streamflow variability, temperature also plays a significant role. Temperature influences streamflow by modifying precipitation, evapotranspiration, and soil moisture. While this relationship is often studied using hydrological or black-box models, the causal effect of temperature dynamics on streamflow at the catchment scale is not fully understood. This study investigates the causal relationship between precipitation, temperature, and streamflow time series using the PCMCI+ causal discovery method. Having the causal structure, the total causal effect of temperature on stream flow is estimated. The analysis is conducted on CAMELS-GB (671 catchments) and LamaH (859 catchments) datasets to study the causal effects of temperature on streamflow across a wide range of catchments with different climate and physiographic characteristics. Preliminary results indicate that temperature significantly influences streamflow within a specific range, which changes over time for most catchments. The changes in the range within which the temperature has high causal effects on the temperature might be due to the shift in catchment storage and precipitation patterns, leading to a change in catchment response to temperature. These findings highlight the importance of identifying a relationship between temperature streamflow variability from a cause-and-effect perspective. This suggests that incorporating causal information can improve the modelling of the hydrological systems under changing climate. 

How to cite: Abbasizadeh, H., Maca, P., and Hanel, M.: Influence of Temperature on Streamflow Dynamics: A Multi-Catchment Analysis Using the PCMCI+ Causal Discovery Algorithm, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-292, https://doi.org/10.5194/egusphere-egu25-292, 2025.

A.26
|
EGU25-1863
|
ECS
Qiang Ye, Zijie Huang, Qiang Zheng, and Lingzao Zeng

Accurate modeling of soil water movement in the unsaturated zone is essential for effective soil and water resources management. Physics-informed neural networks (PINNs) offer promising potential for this purpose, but necessitate retraining upon changes in initial or boundary conditions, posing a challenge when adapting to variable natural conditions. To address this issue, inspired by the operator learning with more universal applicability than function learning, we develop a physics-informed deep operator network (PI-DeepONet), integrating physical principles and observed data, to simulate soil water movement under variable boundary conditions. In the numerical case, PI-DeepONet achieves the best performance among three modeling strategies when predicting soil moisture dynamics across different testing areas, especially for the extrapolation one. Guided by both data and physical mechanisms, PI-DeepONet demonstrates greater accuracy than HYDRUS in capturing spatio-temporal moisture variations in real-world scenario. Furthermore, PI-DeepONet successfully infers constitutive relationships and reconstructs missing boundary flux condition from limited data by incorporating known prior physical information, providing a unified solution for both forward and inverse problems. This study is the first to develop a PI-DeepONet specifically for modeling real-world soil water movement, highlighting its potential to improve predictive accuracy and reliability in vadose zone modeling by combining data-driven approaches with physical principles.

How to cite: Ye, Q., Huang, Z., Zheng, Q., and Zeng, L.: Predicting Water Movement in Unsaturated Soil Using Physics-Informed Deep Operator Networks, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-1863, https://doi.org/10.5194/egusphere-egu25-1863, 2025.

A.27
|
EGU25-2661
Sheng Ye, Jiyu Li, Yifan Chai, Lin Liu, Murugesu Sivapalan, and Qihua Ran

Recent applications have demonstrated the strength of deep learning (DL) in information extraction and prediction. However, its limitations in interpretability have delayed its popularity for use in facilitating advancement of hydrologic understanding. Here we present a framework using explainable artificial intelligence (XAI) as a diagnostic tool to investigate distributed soil moisture dynamics within a watershed. Soil moisture and its movement generated by physically based hydrologic model were used to train a long short-term memory (LSTM) network, whose feature attribution was then evaluated by XAI methods. The aggregated feature importance presents abrupt rise in the model’s nodes located in riparian area, indicating threshold behavior in runoff generation and development of hydrologic connectivity at the watershed scale, which helps explain the rapid increase in streamflow. This work represents a demonstration of the potential of XAI to uncover underlying physical mechanisms and to help develop new theories from observed data.

How to cite: Ye, S., Li, J., Chai, Y., Liu, L., Sivapalan, M., and Ran, Q.: Using explainable artificial intelligence as a diagnostic tool , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-2661, https://doi.org/10.5194/egusphere-egu25-2661, 2025.

A.28
|
EGU25-3871
|
ECS
Karen Elaine Dunbar, Heather McGrath, and Usman T. Khan

Floods are the costliest hazard in Canada in terms of direct infrastructure damage. Flood susceptibility modelling (FSM) identifies flood hazard areas; input features are dependent on the study area and modelling methods, which affect the reliability and accuracy of FS maps. Typical features in FSM are static topographical inputs (digital elevation model, land use, wetness index, height above nearest drainage, etc.). Though meteorological variables have been included in FSM, they are often low temporal resolution (e.g. annual); seasonal meteorological variables are often not included. The 2023 Canadian National FS map was developed using machine learning (ML) ensembles, with features that include historical flood events and 30 years of climate data. This research initiates the update to the existing Canadian FS map by expanding the suite of input features used and comparing the impact of three feature selection methods (partial correlation, partial mutual information, combined neural pathway strength) on three types of ML algorithms: random forest, artificial neural network (ANN) and convoluted neural network (CNN). The expanded set of features includes geospatial indices and flood-specific meteorological data such as spring temperature, precipitation, and vapour pressure. Data from preceding seasons to specific flood events is also included. Preliminary findings from the feature selection methods show that including seasonal flood-specific meteorological data provides important information leading to better model performance. Model performances of the three algorithms were comparable. Random forest with extreme gradient boosting led to the highest model performance (AUC = 0.98, F1 = 0.94), followed by CNN (AUC = 0.0.96, F1 = 0.90). ANN ensemble with leave-one-out-cross-validation resulted in the lowest model performance (AUC = 0.91, F1 = 0.85). Results contribute to the development of an improved national FS map for Canada.

How to cite: Dunbar, K. E., McGrath, H., and Khan, U. T.: Enhancing flood susceptibility modelling in Canada: Integrating seasonal meteorological data, feature selection and machine learning approaches, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-3871, https://doi.org/10.5194/egusphere-egu25-3871, 2025.

A.29
|
EGU25-5727
|
ECS
Lea Faber, Karoline Wiesner, Ting Tang, Yoshihide Wada, and Thorsten Wagener

Model Intercomparison Projects in the Earth Sciences have shown, that the outputs of
Earth System Models often show large variations and can therefore give quite different results,
with no single model consistently outperforming others. Examples include Global Water
Models (GWMs), as well as Global Climate Models (GCMs). The high computational costs
of running such models make comprehensive statistical analyses challenging, a common issue
with many complex models today. Machine learning models have become popular surrogates
of slow process-based models, due to their computational speed, at least once trained. This
speed makes it possible to use techniques from Explainable AI (XAI) to analyze the behavior
of the surrogate model.
Here, we analyze long-term averages of the GWM ’Community Water Model’ (CWatM)
for different parts of the global domain for actual evapotranspiration Ea, total runoff Q and
groundwater recharge R. We train an artificial neural network on the model’s input and output
data and use three different strategies to assess the importance of input data: LassoNet for sub-
set selection and feature ranking, along with Sobol’ indices and DeepSHAP for interpretability.
Our results show that subset selection can effectively reduce model complexity before XAI
analysis. For some hydrological domains the number of relevant input
variables for a chosen output reduces to less then 15 variables out of 98 model inputs, while
others remain more complex, requiring many variables for performances with R2 > 0, 8.

How to cite: Faber, L., Wiesner, K., Tang, T., Wada, Y., and Wagener, T.: Using Explainable Artifical Intelligence (XAI) to Analyze the Behavior of Global Water Models, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-5727, https://doi.org/10.5194/egusphere-egu25-5727, 2025.

A.30
|
EGU25-5052
|
ECS
Aseel Mohamed, Awad M. Ali, Ahmed Ali, Osama Hassan, Mohamed E. Elbasheer, and Mutaz Abdelaziz

Water resources management depends heavily on hydrological modeling for reservoir operation and risk mitigation, especially in data-scarce regions. Hybrid approaches that combine artificial intelligence and conceptual models offer great potential for accurate streamflow prediction. However, their implementation can be time-consuming and applied in different configurations. This study comprehensively compares two promising hybrid frameworks: the Conceptual-Data-Driven Approach (CDDA) and the Ensemble Approach. The analysis was conducted in the Upper Blue Nile Basin in Ethiopia over the period from 2002 to 2019. Six baseline models were developed, including CNN-LSTM (data-driven), NAM and HBV-Light (lumped), and SWAT+, WEAP, and HEC-HMS (semi-distributed). All models achieved NSE ≥ 0.85 during the validation period, with CNN-LSTM performing best (NSE = 0.94). Each model was integrated into the two hybrid frameworks using Random Forest (RF) or Artificial Neural Networks (ANN). Results showed that the Ensemble Approach outperformed CDDA by combining two conceptual models. ANN performed better than RF across both frameworks. Hybrid modeling significantly improved semi-distributed models, while lumped and data-driven models showed minimal benefits. In the Ensemble Approach, normal and extreme flows simulated using semi-distributed models performed best when supported by CNN-LSTM or lumped models. Our analysis also demonstrated the robustness of the Ensemble Approach for selecting the supporting model. These findings emphasize the value and feasibility of the Ensemble Approach for improving streamflow prediction and better supporting decision-making in data-scarce regions. Nevertheless, a thorough understanding of the opportunities in hybrid modeling requires further research with a specific focus on operational forecasting.

How to cite: Mohamed, A., M. Ali, A., Ali, A., Hassan, O., E. Elbasheer, M., and Abdelaziz, M.: Can CNN-LSTM and lumped models improve (extreme) streamflow prediction of semi-distributed models? A comparative analysis of two hybrid frameworks, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-5052, https://doi.org/10.5194/egusphere-egu25-5052, 2025.

A.31
|
EGU25-10444
|
ECS
Christian Thöne, Annemarie Bäthge, and Robert Reinecke

Earth system data, measured by satellites and terrestrial stations and simulated by increasingly complex models, provide valuable information for identifying functional relationships within the Earth system. These relationships are essential for understanding complex interactions and predicting changes, for example, in climatic or ecological processes, but often only occur in certain spatiotemporal sections or within certain threshold values. With the increasing spatiotemporal resolution of remote sensing products and models, a manual analysis is impractical, and hypothesis-driven approaches can lead to undiscovered hidden relationships. Previous work proposed the SONAR (automated diScovery Of fuNctionAl Relationships) decision-tree algorithm to automatically search for functional relationships in earth system data without a-priori assumptions. We analyzed the proposed algorithm using artificially generated data to evaluate SONAR's functionality.  We tested if the choice of statistical indicator (Pearson’s r, Spearman’s ρ, Kendall’s τ, and Mutual Information) influences the functionality of the SONAR algorithm and which factors are important for the identification of functional relationships. Using 1512 synthetic data sets and the developed SAMPI (Similarity of A Manifested and Prototypical decision tree Indicator) coefficient, we demonstrate how the performance of the algorithm changes under different variations of the data sets - including the number of designated splits, the presence of interfering variables and the strength and nature of the underlying functional relationships. In particular, we show which statistical indicator provides the best results under these conditions. The results demonstrate that the SONAR algorithm is very versatile, especially when employing the most reliable statistical indicator. The SONAR algorithm could, therefore, have far-reaching applications, for example, in analyzing climatic patterns or investigating dependencies between environmental factors.

How to cite: Thöne, C., Bäthge, A., and Reinecke, R.: The effects of different statistical indicators in the new decision-tree-based SONAR algorithm for automated detection of functional relationships in Big Earth Data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10444, https://doi.org/10.5194/egusphere-egu25-10444, 2025.

A.32
|
EGU25-12068
|
ECS
Annie Y.-Y. Chang, Elena Leonarduzzi, Christian M. Grams, and Vincent W. Humphrey

Like much of Europe, Switzerland is increasingly experiencing severe summer droughts and heatwaves, prompting the mandate for an advanced national drought monitoring and early warning system. A key component of this initiative is the generation of gridded soil moisture estimates that are spatially distributed, extending beyond measurement stations.  Here, we present the concept of a novel physics-constrained land surface model emulator designed to produce high-resolution (e.g. finer than 250m), gridded soil moisture estimates up to 2m depth across Switzerland's diverse topography and climatic conditions. 

This framework aims to integrate multi-source datasets, including in-situ measurements, and reanalysis products, to train a machine learning based (e.g. Convolutional LSTM, or XGBoost) hybrid emulator that ensures physically consistent outputs. Compared to conventional dynamical land surface models, an emulator has the advantage of being more computationally efficient and less constrained by the specific requirements of a given numerical model (in terms of input variables and technical dependencies). To fulfil the needs of a very diverse user community, ranging from numerical weather prediction to agricultural decision-making, the emulator should be optimized for multi-scale applications, from climatological analysis, to near-real-time monitoring, and to medium-term forecasting.

How to cite: Chang, A. Y.-Y., Leonarduzzi, E., Grams, C. M., and Humphrey, V. W.: A Physics-Constrained Emulator for High-Resolution Soil Moisture , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12068, https://doi.org/10.5194/egusphere-egu25-12068, 2025.

A.33
|
EGU25-13799
|
ECS
Amit Kumar and Kun Zhang

Cyanobacterial blooms have become more frequent and intense in Lake Superior since 2012, primarily due to increased nutrient loads, with phosphorus being the main limiting factor. To protect water quality, extensive monitoring of lakes and streams is crucial, but it is not cost-effective or practical to measure nutrients frequently across all ecosystems. This study presents a cost-effective, transferable solution using machine learning (ML) models to predict phosphorus concentrations and loads based on conventional water quality parameters like streamflow, dissolved oxygen, conductivity, turbidity, transparency, and total suspended solids. The research introduces an explainable hybrid ML framework combining probabilistic principal component analysis (P2CA) with several ML models, including Bagging Ensemble Learning, Boosting Ensemble Learning, Gaussian Process Regression, and Support Vector Regression, to enhance prediction accuracy. Results demonstrate that the P2CA-Boosting Ensemble Learning model consistently outperforms other approaches. To confirm its effectiveness, the developed model was tested with the same input data from a different river catchment, proving it works well in different environments. This study highlights the potential of combining P2CA with Boosting Ensemble Learning as a powerful tool for water quality management in streams and rivers.

How to cite: Kumar, A. and Zhang, K.: Development of a hybrid machine learning model to predict total phosphorus in streams over the north shore of Lake Superior, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13799, https://doi.org/10.5194/egusphere-egu25-13799, 2025.

A.34
|
EGU25-19057
|
ECS
Fedor Scholz, Christiane Zarfl, Thomas Scholten, and Martin V. Butz

The delineation of catchment areas from elevation is a fundamental step in lumped process-based models (PBMs). Most machine learning (ML) approaches for rainfall-runoff modeling spatially aggregate inputs to represent basin-wide processes. Elevation-based lumping, however, disregards both human interventions such as drainages and underground hydrological flows, which can lead to significant model inaccuracies. In this work, we employ DRRAiNN (Distributed Rainfall-Runoff Artificial Neural Network) – a fully distributed neural network architecture – to infer catchment areas directly from observed precipitation and discharge dynamics without prior delineations.

As a first evaluation of the potential to infer actual catchment areas with DRRAiNN, we trained the model on relatively sparse data from 2006 until 2015: Radolan-based hourly precipitation data as input with a spatial resolution of 4x4 km and only daily discharge measurements from 17 stations in the Neckar river basin as target output. Elevation and solar radition were given as additional parameterization input. As DRRAiNN is fully differentiable, we were then able to infer station-specific attribution maps via backpropagation through space and time. To evaluate the alignment between the inferred attribution maps and elevation-based catchment areas, we compute the Wasserstein distance between attributions inside and outside the catchment boundaries. A higher distance indicates better agreement. The results show that DRRAiNN learns to propagate water in a physically plausible manner. Further, we reveal deviations that indicate additional water flows that are undetectable from elevation data alone. Our findings thus suggest that DRRAiNN captures key rainfall-runoff dynamics while avoiding the limitations of lumped models.

The quantitative evaluations alongside qualitative comparisons underscore the model’s potential for uncovering hidden hydrological processes. We show that catchment area estimates can be inferred from relatively little discharge data, which may, in the future, potentially be substituted by satellite data. As a result, DRRAiNN may be applicable in ungauged catchments. Given actual discharge measurements or discharge estimations, DRRAiNN can be used to analyze the hydrological dynamics of surface and subsurface runoff as well as baseflow esimations and has the potential to uncover unexpected and unknown runoff dynamics that would not be detectable otherwise.

How to cite: Scholz, F., Zarfl, C., Scholten, T., and Butz, M. V.: Inference of catchment areas from modeled discharge dynamics, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19057, https://doi.org/10.5194/egusphere-egu25-19057, 2025.

A.35
|
EGU25-18468
|
ECS
Vidhi Singh, Abhilash Singh, and Kumar Gaurav

 Soil moisture, one of the essential climate variables, forms a fundamental bridge between hydro-meteorological processes and influence climate dynamics. It is extremely variable and is driven by numerous hydrological, agricultural and ecological factors. Soil moisture subsequently impacts soil forming processes, root zone water availability, infiltration rates, runoff, groundwater storage and vegetation-soil interaction. Despite its significant contribution in hydro-ecological interaction, its variability at subsurface is not yet explored adequately. Precise estimation of soil moisture at various depths is crucial because it affects water retention characteristics and modulates the vertical and lateral movement of water within the soil profile. This subsurface information is integral to understanding recharge rates, groundwater interactions, and the overall water balance within a catchment. In this study, we present an automated machine learning framework designed to predict soil moisture at multiple depths of 10 cm, 20 cm, 30 cm, and 40 cm leveraging Bayesian optimization. We collected data from our hydrological observatory set up constituting an automatic weather station, a pan evaporimeter and a soil moisture recorder. To evaluate model performance, we categorized the dataset into four scenarios (S1, S2, S3, and S4), with each subsequent scenario incorporating a greater number of observations and rainfall events. We used 11 input features to train this AutoML model by integrating several hydrological and meteorological variables with in-situ soil moisture data. Among the predictor variables, humidity, dew point, and rainfall emerged as the most influential factors driving soil moisture variability. The model was trained to calculate the performance metrices for the entire dataset and for subsets containing only rainfall instances. Our optimized model demonstrated superior performance, with an R² of 0.88–0.99 and RMSE < 0.022 for the overall dataset, and R² of 0.76–1.00 with RMSE < 0.06 for rainfall-specific data across all soil moisture depths.

How to cite: Singh, V., Singh, A., and Gaurav, K.: An automated machine learning framework for multi-depth soil moisture prediction using hydro-meteorological datasets, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18468, https://doi.org/10.5194/egusphere-egu25-18468, 2025.

A.36
|
EGU25-4845
|
ECS
David Strahl, Sebastian Gnann, Karoline Wiesner, and Thorsten Wagener

Catchments are the fundamental units of hydrological analysis and integrate a vast number of physical, biological, and anthropogenic processes. Traditional hydrological modelling approaches, however, adopt a bottom-up perspective, aggregating small-scale physical principles to predict large-scale catchment behaviour. While effective for prediction, this approach can fall short in advancing our understanding of emergent processes and their interactions given the strong dependence on a priori assumptions. To address this gap, causal discovery algorithms offer a promising alternative by moving beyond simple correlation to directly identifying the dynamic causal structures emerging at the catchment scale. In this study, we applied the PCMCI+ algorithm to the CAMELS-US dataset in combination with a subsequent causal effect estimation. We explored how and to what extent dynamic causal structures can be learned from hydro-meteorological data alone, and which catchment properties and conditions influence their expression. We find that causal discovery in hydrology faces challenges due to non-stationarity, unsuitable conditional independence tests, and unmet methodological assumptions. Despite these limitations, our approach reconstructed physically plausible relationships controlled by meaningful catchment properties. These results highlight the potential of causal discovery in hydrology, where it could serve as a complementary framework for model evaluation studies or as an integral part of the model development process.

How to cite: Strahl, D., Gnann, S., Wiesner, K., and Wagener, T.: Using Causal Discovery to Identify Drivers and Controls of Streamflow in Large Sample Hydrology, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-4845, https://doi.org/10.5194/egusphere-egu25-4845, 2025.

A.37
|
EGU25-13311
|
ECS
Georgios Blougouras, Alexander Brenning, Mirco Migliavacca, and Markus Reichstein

Vegetation plays an important but complicated role in modulating land-atmosphere interactions and the water cycle. Under global change, increasing vegetation greenness trends have been observed, which further complicate the control of vegetation in the earth system. Despite growing interest in the role of vegetation in the hydrological processes, large uncertainties still exist, particularly when it comes to the underexplored response of streamflow to vegetation greening. In this study, we explore the watershed-relevant biophysical controls of vegetation greening on streamflow. In order to do so, we develop a hybrid ecohydrological model. This model adheres to the water balance principles, while it simultaneously has a flexible structure that enables integrating physical insights from observational data. The multi-task learning optimization ensures physical consistency across a range of processes and temporal frequencies, which allows us to investigate the cascading impacts of vegetation changes across the water cycle, leading up to the streamflow as an end-process. Ecohydrological insights are directly derived from observational data, while physically meaningful model parameters reflect how ecosystem functions and hydrological processes respond to vegetation changes. We find that the marked change in streamflow can be attributed to vegetation change controls on diverse biophysical processes. Our research highlights the potential of hybrid models to capture complex earth system processes by exploiting multiple observational data streams, machine learning and physical constraints.

How to cite: Blougouras, G., Brenning, A., Migliavacca, M., and Reichstein, M.: Hybrid hydrological modelling of the biophysical impacts of earth’s greening on streamflow, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13311, https://doi.org/10.5194/egusphere-egu25-13311, 2025.

A.38
|
EGU25-11693
|
ECS
Olivier Bonte, Diego G. Miralles, Akash Koppa, and Niko E. C. Verhoest

Terrestrial evaporation (E) is an essential climate variable, linking water, energy and carbon cycles. As E is influenced by the state of the atmospheric boundary layer, vegetation and soil, its modelling is a complex task, resulting in a myriad of simulation approaches. To combine the strong predictive skills of data-driven models with the interpretability and physical consistency of process-based models (PBMs), a new research field of differentiable modelling has emerged1

Here, we present a differentiable framework for E estimation, facilitating online training of NNs as intermediate PBM components. It is inspired by the GLEAM framework for estimating E, which applies offline training (i.e., outside the PBM) of neural networks (NNs) predicting evaporative stress2,3. Building upon the Julia SciML ecosystem’s implementation of universal differential equations4, a wide array of numerical methods are available for solving the PBM’s ordinary differential equations (ODEs) and calculating the parameter sensitivities5. In this way, the effect of the numerical methods on the obtained hybrid model can be investigated, moving beyond the direct automatic differentiation through explicit Euler solutions of ODEs as often applied in other hydrological hybrid modelling approaches. 

 

References

1Shen, C., Appling, A.P., Gentine, P. et al., Differentiable modelling to unify machine learning and physical models for geosciences, Nat. Rev. Earth. Environ., 4, 552–567, 2023, https://doi.org/10.1038/s43017-023-00450-9

2Koppa, A., Rains, D., Hulsman, P. et al., A deep learning-based hybrid model of global terrestrial evaporation, Nat. Commun., 13, 1912, 2022, https://doi.org/10.1038/s41467-022-29543-7

3Miralles, D. G., Bonte, O., Koppa, A. et al., GLEAM4: global land evaporation dataset at 0.1° resolution from 1980 to near present, preprint, 2024, https://doi.org/10.21203/rs.3.rs-5488631/v1 

4Rackauckas, C., Ma, Y.,  Martensen, J. et al., Universal differential equations for scientific machine learning, ArXiv, 2020, https://doi.org/10.48550/arXiv.2001.04385 

5Sapienza, F., Bolibar, J., Schäfer, F. et al., Differentiable Programming for Differential Equations: A Review, ArXiv, 2024, https://doi.org/10.48550/arXiv.2406.09699 

How to cite: Bonte, O., Miralles, D. G., Koppa, A., and Verhoest, N. E. C.: Universal differential equations for estimating terrestrial evaporation, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-11693, https://doi.org/10.5194/egusphere-egu25-11693, 2025.

A.39
|
EGU25-3859
|
ECS
Rahma Khalid and Usman T Khan

Flood susceptibility mapping (FSM) plays a crucial role in proactive flood risk management, particularly in light of increasing fluvial flooding events. Traditional FSM methods, such as physics-based and qualitative approaches, are hindered by either high computational demands or inherent uncertainty. To address this, machine learning (ML) models have become an increasingly popular FSM approach, though commonly cited as black-box approaches due to the difficulty associated with understanding their underlying mechanisms. In order to better understand the ML approaches used for FSM, this study uses the gradient-weighted class activation mapping (Grad-CAM) to interpret flood susceptibility predictions of a convolutional neural network (CNN) for the Don River watershed in Ontario, Canada. Grad-CAM is an explainable algorithm highlighting input regions that are influential to the output, aiding the user in understanding and visualizing model selected important features used to arrive at the prediction. Grad-CAM results are compared to the commonly used shapley additive explanation (SHAP) algorithm. SHAP is used to calculate the relative contribution of each input onto the output, and provides a benchmark for comparisons due to its popularity.

A two dimensional CNN with an architecture of two convolutional layers, two pooling layers and a fully connected layer is used to predict flood susceptibility. The inputs to the CNN include topographical and climactic variables across the entire watershed, with a 60-40% training and testing split respectively. The results of the CNN were compared against the floodplain map of the Don River. Using the area under curve- receiver operating characteristics (AUC-ROC) as a performance metric, the CNN exhibits high performance with an AUC-ROC of 0.96.

The study highlights the potential of CNNs for flood susceptibility mapping, as well as compares two explainable machine learning algorithms, helping to further their application within FSM. Explainable algorithms are essential to decision makers in flood risk management for proactive planning and resource allocation. Future work should explore expanding the scope to predict flood susceptibility at a nationwide level.

How to cite: Khalid, R. and Khan, U. T.: Explainable convolutional neural network for flood susceptibility mapping in Southern Ontario  , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-3859, https://doi.org/10.5194/egusphere-egu25-3859, 2025.

A.40
|
EGU25-16863
|
ECS
Ho Tin Hung and Li-Pen Wang

The Integrated Multi-satellite Retrievals for GPM (IMERG) is a global satellite-based precipitation dataset that provides near real-time precipitation estimates by combining multiple satellite measurements. IMERG integrates microwave (MW) observations from low-orbit satellites with precipitation estimates inferred from the brightness temperature of geostationary infrared (IR) imagery. MW measurements provide accurate precipitation estimates due to their direct interaction with precipitation particles, while IR measurements offer broader spatial and temporal coverage by inferring precipitation from cloud-top brightness temperatures. Together, these complementary techniques balance precision and coverage to improve global precipitation monitoring. However, IR-based precipitation estimates are inherently less reliable due to the weak direct correlation between brightness temperature and precipitation. Conversely, MW-derived estimates are more accurate but spatially constrained by the limited footprint of low-orbit satellites. To investigate the contributing factors in IR precipitation error calibration, we leveraged ERA5 Land, a high-resolution reanalysis dataset that includes surface variables across nine domains, such as temperature, soil moisture, radiation, and vegetation indices. These variables offer a comprehensive lens for understanding the impact of the land surface on precipitation dynamics. We employed the XGBoost machine learning model to predict the errors in IR precipitation estimates relative to MW-derived benchmarks. Additionally, SHapley Additive exPlanations (SHAP) values were used to interpret the model’s predictions, uncovering how individual input features contribute to error correction.


Our findings indicate that the explainable machine learning model can correct the infrared (IR) precipitation estimates to resemble microwave (MW) products, achieving notable improvements across statistical metrics. In the preliminary analysis of 165 countries and territories, the XGBoost model’s calibration improved the RMSE in all validation datasets, with a median reduction of 19.89% and an average reduction of 22.5%. Similarly, the correlation coefficient improved, with a median increase of 18.43% and an average increase of 54.49%. Moreover, the spatial and temporal distributions of the variables' SHAP values show various patterns. The clustered spatial distribution may represent the local climate attributes in specific geographic regions, providing insights into how regional environmental factors influence precipitation estimates. Meanwhile, the temporal distribution may imply seasonal variation, which can help identify patterns in precipitation dynamics and refine IR-based calibration by accounting for temporal variability in precipitation processes. This study provides a robust framework for leveraging land surface variables to refine IR-based precipitation products. By integrating reanalysis data with machine learning models, we present a scalable solution for improving precipitation monitoring in data-sparse regions, particularly where MW observations are unavailable.

How to cite: Hung, H. T. and Wang, L.-P.: IRMerg: Enhancing Global Infrared Precipitation Estimates with Land Surface Variables and Contributing Factors Analysis Using Explainable Machine Learning, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16863, https://doi.org/10.5194/egusphere-egu25-16863, 2025.

A.41
|
EGU25-18102
|
ECS
Sara Asadi, Patricia Jimeno-Sáez, Adrián López-Ballesteros, and Javier Senent-Aparicio

Precise streamflow forecasting in river systems is crucial for water resources management and flood risk assessment. This study focuses on the Tagus Headwaters River Basin (THRB) in Spain, a key hydrological basin providing essential water for urban, industrial, and irrigation purposes. Additionally, a significant portion of its water resources is transferred to the Segura River Basin through the Tagus-Segura water transfer, Spain’s most extensive hydraulic infrastructure. Given that nearly all available water in the THRB is allocated for these demands, precise streamflow forecasting is vital. For streamflow estimation in this basin, we evaluated the Soil and Water Assessment Tool (SWAT+), a physically-based model, and three AI-based models: support vector regression (SVR), feed-forward neural network (FFNN), and long short-term memory (LSTM) models, across four gauging stations within the THRB. For the AI-based models, rainfall and time-lagged runoff data were used as input data. Additionally, an ensemble machine learning technique was evaluated, using the outputs of both physically-based and AI-based individual models as inputs for the ensemble model. The results show that the AI-based models and the ensemble machine learning technique significantly outperformed the SWAT+ model. While the precision of the AI-based models was considerably higher than that of the SWAT+ model, the application of the ensemble technique enhanced the precision of the AI-based models by 18 to 26% during the calibration period and 4.1 to 9.2% during the validation period. Furthermore, the Shapley Additive Explanations (SHAP) methodology was used to explore how each model contributes to the predictions in the ensemble technique. This work was supported by the Spanish Ministry of Science and Innovation, under grants PID2021-128126OA-I00.

How to cite: Asadi, S., Jimeno-Sáez, P., López-Ballesteros, A., and Senent-Aparicio, J.: In the application of physically-based and interpretable AI-based models for streamflow simulation, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18102, https://doi.org/10.5194/egusphere-egu25-18102, 2025.

Posters virtual: Tue, 29 Apr, 14:00–15:45 | vPoster spot A

The posters scheduled for virtual presentation are visible in Gather.Town. Attendees are asked to meet the authors during the scheduled attendance time for live video chats. If authors uploaded their presentation files, these files are also linked from the abstracts below. The button to access Gather.Town appears just before the time block starts. Onsite attendees can also visit the virtual poster sessions at the vPoster spots (equal to PICO spots).
Display time: Tue, 29 Apr, 08:30–18:00
Chairperson: Louise Slater

EGU25-10531 | ECS | Posters virtual | VPS9

Climate and catchment influences on streamflows in Brazilian watersheds 

Abderraman Brandão, Admin Husic, André Almagro, Dimaghi Schwamback, and Paulo Oliveira
Tue, 29 Apr, 14:00–15:45 (CEST) | vPA.2

South America holds vast freshwater reserves, contributing to its global prominence across various sectors. Understanding streamflows at different levels—minimum flows for ecosystem maintenance, mean flows for hydropower and navigation, and high flows associated with floods—is critical for ensuring societal and ecological resilience. These streamflows are influenced by changes in catchment characteristics and climate change, yet the relationship between climate and catchment drivers with streamflows, particularly in tropical regions, remains poorly understood. Recent advances in explainable artificial intelligence (XAI) offer promising avenues for addressing these gaps by linking observational data to potential causal inference. Here, we investigated the climatic and catchment drivers influencing five streamflow types (Q1, Q5, Qmean, Q95 and Q99) across 735 Brazilian watersheds using XAI approaches. Random Forest models were trained with 16 most important attributes for each streamflow type. SHapley Additive exPlanations were applied to explain the directionality and magnitude of each driver's impact, while inflection points were delineated to capture critical thresholds for streamflow changes. Results showed the aridity index (potential evapotranspiration/precipitation) as the most impactful predictor globally, likely due to its role in long-term water balance. However, for Q99, soil sand content emerges as the dominant factor, showing that catchment characteristics rival climatic factors in importance for rare streamflow events. The analysis highlighted critical thresholds, such as reductions in streamflow when the aridity index exceeds 1.30 and potential declines in streamflow for soil carbon content below 30%, likely due to reduced water infiltration and storage capacity. Similarly, forest cover below 40% potentially increases streamflows, possibly due to reduced evapotranspiration and water retention in soils. Regional differences were also observed: in central Brazil, land cover and land use, and topography potential response for decreased the low streamflows, while in the south and northeast, climatic factors such as aridity and precipitation seasonality control the potential decreases. Rare high events (Q99) in the south this watershed scale attributes height above the nearest, permeability and porosity potential increases the magnitude of events. These findings highlight that, while climatic attributes dominate streamflow relationships at a national scale, regional variations underscore the importance of catchment characteristics. This study demonstrates how data-driven models have the potential to capture the complex interplay between climatic and catchment attributes, linking these factors to streamflow dynamics.

How to cite: Brandão, A., Husic, A., Almagro, A., Schwamback, D., and Oliveira, P.: Climate and catchment influences on streamflows in Brazilian watersheds, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10531, https://doi.org/10.5194/egusphere-egu25-10531, 2025.

EGU25-19050 | ECS | Posters virtual | VPS9

Application of Unsupervised Machine Learning Algorithms for identifying critical river confluence in a mountainous watershed. 

Naman Rajouria, Pragati Parajapati, and Sanjeev Kumar Jha
Tue, 29 Apr, 14:00–15:45 (CEST) | vPA.3

In a mountainous watershed, there are many confluences at which two or more streams join. Due to inaccessible terrain and associated costs, river discharge data is collected only at a few confluences. It is, therefore, important to assess which confluence is critical. By critical, we mean the junction which will create maximum fragmentation in a river network. In this study, we analysed river networks with uneven topography in the Alaknanda River basin, which is vulnerable and prone to geo-hydro hazards. We applied Unsupervised Machine Learning (UML) algorithms such as Isolation Forest, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Linear Integer Programming (LIP) to identify the critical confluence locations. We compare our results with the well-established graph-based centrality metrics (Degree centrality, Betweenness centrality, Closeness centrality, and Eigen Vector Centrality). Our results suggest that DBSCAN outperformed other approaches in terms of detecting crucial nodes. We obtained better results using LIP than other techniques except DBSCAN. The outcome of this study will help the Central Water Commission, in deciding which confluence to focus on, and in assessing the locations of new gauges.

Keywords: Critical nodes; Alaknanda Basin; Machine Learning; Hazards

How to cite: Rajouria, N., Parajapati, P., and Jha, S. K.: Application of Unsupervised Machine Learning Algorithms for identifying critical river confluence in a mountainous watershed., EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19050, https://doi.org/10.5194/egusphere-egu25-19050, 2025.