- 1Civil, Geological, and Environmental Engineering, University of Saskatchewan, Saskatoon, Canada (amin.elshorbagy@usask.ca)
- 2Ocean, Coastal, and River Engineering Research Centre, NRC, Ottawa, Canada
- 3Alberta Environment and Protected Areas, Government of Alberta, Edmonton, Canada
- 4Manitoba Transportation and Infrastructure, Hydrologic Forecast Centre, Winnipeg, Canada
The use of artifical intelligence (AI) and machine learning (ML) approaches in various scientific and engineering disciplines has grown exponentially over recent years. This upsurge also includes applications of physics-guided ML models and explainable AI. However, in addition to the dificulties involved in the identification of relevant model inputs, the advantages, contributions, and credibility of ML models are still open challenges, especially when these models are evaluated against the perceptual hydrologic understanding of the system in question. In this study, we aim to investigate some of these challenges using the case of seasonal streamflow forecasting with lead times up to three months in several hydrologically challenging river basins of prairie provinces of Canada (i.e., Alberta, Saskatchewan, and Manitoba).
Multiple ML techniques, including Random Forest (RF) and Long Short-Term Memory (LSTM) models, are used to produce ensemble forecasts for 135 sub-basins of the Nelson-Churchill River Basin, comprising the vast area from the Rocky mountains up to the Hudson Bay, with the monthly temporal resolution and spatial scales of the order of 200 km2 to ~1.0 x106 km2, as reflected by drainage areas of all sub-basins. A large set of potential inputs (105 predictors) is used in this study. These potential inputs include hydrometeorological variables derived from the Daymet database, Environment and Climate Change Canada’s hydrometric network, and hydrometeorological forecasts from the European Centre for Medium-Range Weather Forecasts, and various static attributes of all sub-basins.
The Pearson’s correlation coefficient (CC) and Partial Mutual Information (PMI) were used, as model agnostic methods, to analyze the set of potential predictors and identify the most appropriate inputs for seasonal flow forecasting, prior to ML model development. Subsequently, modeling experiments were designed to investigate the ML model performance and test the usefulness of CC and PMI based techniques on modeling results. The model-agnostic and model-dependent findings were compared and analyzed in light of the perceptual understanding of the hydrological system. Furthermore, the Convergent Cross-Mapping (CCM) method was used with selected variables to further explore the causal, rather than correlational, relationships and interpret the results with the aim of developing ethical and responsible ML (ERML) models. We define ERML models as data driven models that are transparent and hydrologically explainable.
The preliminary results of this study indicate that PMI is quite effective in filtering some of the CC-based selections, which might form multiple equifinale sets of predictors. This step is critical for identifying the most relevant and necessary inputs. In spite of the coarse spatial and temporal resolutions, which complicate crisp hydrologic perceptions, the CCM method seems to support the selection of various input variables with hydrologic causality, strengthening the transparency and credibility of ML models.
How to cite: Elshorbagy, A., Nguyen, D.-H., Khaliq, M. N., Akhtar, M. K., and Unduche, F.: Positioning ML Models for Spatial and Temporal Modeling of River Flows Through Causality and Information Content Analyses, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-3093, https://doi.org/10.5194/egusphere-egu25-3093, 2025.