EGU25-17403, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-17403
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
PICO | Tuesday, 29 Apr, 08:54–08:56 (CEST)
 
PICO spot A, PICOA.8
Exploring watershed similarities through machine learning and watershed descriptors: enhancing hydrological predictions
Gabriele Bertoli1, Kai Schröter2, Rossella Arcucci3, and Enrica Caporali1
Gabriele Bertoli et al.
  • 1Department of Civil and Environmental Engineering, Università degli studi di Firenze, Firenze, Italy
  • 2Leichtweiß-Institute for Hydraulic Engineering and Water Resources, Technische Universität Braunschweig, Braunschweig, Germany
  • 3Department of Earth Science & Engineering, Imperial College London, London, England

The increasing variability of precipitation and temperature extremes under climate change requires advanced methodologies to better understand and predict watershed responses. Watersheds with similar features are expected to exhibit comparable hydrological responses to meteorological events, and by clustering them, we aim to improve knowledge transfer from data-rich to data-scarce regions and enhance hydrological process analysis and prediction.

Our approach leverages machine learning to cluster watersheds based on shared characteristics, such as topography, land cover, soil properties and geology, proposing an expanded perspective on watershed similarities and their implications on the understanding of hydrological phenomena.

We utilize 62 lumped watershed descriptors provided by the LamaH-CE large-sample hydrology dataset (https://doi.org/10.5194/essd-13-4529-2021) including key attributes for each catchment, such as area, mean elevation, slope, land use, NDVI, soil porosity, and rock permeability. A Principal Component Analysis (PCA) was first applied to reduce dimensionality and identify the most significant watershed descriptors. Next, four unsupervised learning models - K-means, Gaussian Mixture Models (GMM), Hierarchical Clustering, and DB Scan - were implemented for clustering the watersheds using the selected descriptors. The models’ performances were systematically evaluated and compared regarding shape factors and cluster interpretation across different watershed categories. Advanced dimensionality reduction techniques and arbitrary descriptor selection were tested to ensure robustness of the procedures. Stability testing and hyperparameter optimization further confirmed the clustering models. The resulting clusters were explored through detailed maps and 2D and 3D plots, revealing patterns of similarity across diverse geographic and hydrological regions in the LamaH-CE domain. For instance, watersheds that are characterized by large areas and modest elevations ranges are in the same cluster, even if they are not hydrologically connected or close to each other. Especially when working at large spatial scales, where basins with different response types are analysed together, watershed clustering allows to tailor specific modelling and analysis techniques for different watershed clusters, providing additional and more precise knowledge on watershed behaviour.

Future research steps will focus on testing this methodology as a basis for transferring knowledge from gauged to ungauged basins within the same cluster, enhancing predictive capabilities in data-scarce regions.

Beyond hydrological predictions, the clusters of watershed characteristics can also find applications in water resources planning and management in low-data regions supporting more informed decision-making.

How to cite: Bertoli, G., Schröter, K., Arcucci, R., and Caporali, E.: Exploring watershed similarities through machine learning and watershed descriptors: enhancing hydrological predictions, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-17403, https://doi.org/10.5194/egusphere-egu25-17403, 2025.