EGU23-4440, updated on 22 Feb 2023
https://doi.org/10.5194/egusphere-egu23-4440
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Developing a random forest model to quantify streamflow intermittency in Pan-Europe at a spatial resolution of 15 arc-sec

Mahdi Abbasi1, Tim Trautmann1, and Petra Döll1,2
Mahdi Abbasi et al.
  • 1Goethe University Frankfurt, Institute of Physical Geography, Hydrology Working Group, Frankfurt am Main, Germany (abbasi@em.uni-frankfurt.de)
  • 2Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt am Main, Germany

Intermittent streams, where water ceases to flow during some time, are a unique habitat for freshwater biota that are adapted to these conditions and provide many ecosystem services. Shifts in intermittency patterns, for example due to climate change, are problematic. To quantify streamflow intermittency in all of Europe at a spatial resolution of 15 arc-sec (approx.. 500 m), we developed a machine learning approach that combines daily streamflow observations, the output of a global hydrological model as well as other physiogeographic data to estimate monthly time series of the number of no-flow days.

Daily streamflow observations at initially a selection of initially close to 2000 stations gauging stations across Europe from the SMIRES, GRDC and GSIM databases were used as target for training the ML model. We selected those stations with at least 18 complete (no day is missing) monthly records in the period 1980-2019. Predictors include monthly time series of simulated hydrological indicators at two spatial resolutions, 15 arc-sec (high resolution HR) and 0.5 arc-deg (approx. 50 km, low resolution LR) as well as static HR environmental indicators (e.g. drainage area).  The hydrological indicators were derived from the global hydrological model WaterGAP 2.2e. Its native LR output including surface runoff, and groundwater discharge was used for computing HR time series of monthly streamflow across all of Europe. A comparison of streamflow observations shows a reasonable fit to observations. HR hydrological indicators include specific streamflow in current and previous months.  Examples for LR hydrological predictors include the groundwater recharge to total runoff ratio and daily streamflow variability with each month.

We considered a sequential statistical modeling approach (in the first stage: binary classification, and in the second stage: multiclass classification) owing to the zero-inflated and imbalanced data issues. In the first stage, a Random Forest (RF) model is built up to classify a binary classification of each month as either intermittent (with at least one no-flow day) or perennial. Then, by taking into account only those stations that were in the first step either predicted or observed to be intermittent, we developed another model to predict four classes of intermittency (e.g. with 1-2, 3-15, 16-27, 28-31 of no-flow days per month). A random oversampling of non-perennial gauging stations was implemented for both stages in order to address the biases in the RF model caused by the class imbalance in the training data. Three cross-validation techniques were applied for estimating the model performance, hyperparameter tuning, and model selection, including non-spatial, spatial, and spatial-temporal cross-validations. Balanced class accuracy, sensitivity, specificity, and precision supported the model selected. The most important predictors for streamflow intermittency will be presented as well as the spatial distribution of the four intermittency classes in Europe (without Russia).

How to cite: Abbasi, M., Trautmann, T., and Döll, P.: Developing a random forest model to quantify streamflow intermittency in Pan-Europe at a spatial resolution of 15 arc-sec, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-4440, https://doi.org/10.5194/egusphere-egu23-4440, 2023.