EGU25-5646, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-5646
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Wednesday, 30 Apr, 17:50–18:00 (CEST)
 
Room 3.16/17
Cross-continental application of a random forest model for streamflow intermittence from data-rich to data-poor regions  
Mahdi Abbasi1 and Petra Döll1,2
Mahdi Abbasi and Petra Döll
  • 1Institute of Physical Geography, Goethe University Frankfurt, Frankfurt/Main, Germany
  • 2Senckenberg Leibniz Biodiversity and Climate Research Centre (SBiK-F) Frankfurt, Frankfurt/Main, Germany

Streamflow intermittence, i.e. the days without streamflow, was predicted with a high spatial resolution for the whole of Europe by downscaling the output of a global hydrological modeling and using the deriving monthly time series for computing predictors, together with other predictors, in a random forest model that simulates the number of no-flow days (Döll et al. 2024(. Development of the data-driven random forest model required a large amount of daily streamflow observations. Now, the challenge is to learn from this modeling work for simulating streamflow intermittence on continents with fewer daily streamflow observations such as South America. What is the quality of simulated streamflow intermittence in South America if we apply the random forest model trained for Europe for South America, i.e. running the model with predictors specific to South America?

We focused on three main aspects: 1) evaluating the similarity of predictor values in the training continent Europe and the application continent South America, 2) conducting sensitivity analysis for the number of observations and 3) utilizing different explainable AI methods. For the first point, we performed two analyses: 1) examining the probability distribution function of 23 predictors across both continents and 2) applying the area of applicability (AOA) analysis for the period 1981-2019. The AOA indicates where the predictor values in South America fall within the range of values that were used to develop the RF model trained on European data. This analysis helps identify areas where the model's predictions are likely to be most reliable, based on the similarity of environmental conditions to those in the training data.

We also analyzed the sensitivity of simulated streamflow intermittence to the number of gauge-months with observed no-flow days by 1) building several different models, each trained on a randomly selected subset of European gauging stations (i.e., 50% of the total), including all monthly values for these gauging stations and 2) evaluating the performance of these models on the remaining gauging stations not used in training and 3) comparing the resulting continent-wide streamflow intermittence patterns across Europe to assess consistency and variability in predictions. Finally, we leveraged various explainable AI methods to analyze the influence of each predictor on the results of the RF model. This analysis helps identify potential biases and understand how models perform across different geographical contexts. Without explainable AI, there is a risk of failing to meet the specific needs of different regions, undermining the model’s effectiveness and reliability when applied across diverse geographical areas.

How to cite: Abbasi, M. and Döll, P.: Cross-continental application of a random forest model for streamflow intermittence from data-rich to data-poor regions  , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-5646, https://doi.org/10.5194/egusphere-egu25-5646, 2025.