EGU25-18245, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-18245
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Monday, 28 Apr, 17:35–17:45 (CEST)
 
Room B
Do spatial random forest variants improve the regionalization of environmental pollutants? - The case of groundwater nitrate concentration
Jonathan Frank1,2, Thomas Suesse1,2,3, Shijie Jiang2,4, and Alexander Brenning1,2
Jonathan Frank et al.
  • 1Friedrich-Schiller-University Jena, Institute of Geography, Geographic Information Science, Jena, Germany
  • 2ELLIS Unit Jena, Jena, Germany
  • 3National Institute for Applied Statistics Research Australia, School of Mathematics and Applied Statistics, University of Wollongong, NSW 2522 Australia
  • 4Max Planck Institute for Biogeochemistry, Department of Biogeochemical Integration, Jena, Germany

Machine learning models, particularly Random Forests (RF), are increasingly used to regionalize environmental pollutants based on point measurements. Spatial variants of RF are emerging to account for geospatial data characteristics, such as spatial autocorrelation and non-stationarity. However, systematic comparisons of these spatial RF variants remain limited.

This study evaluates seven spatial RF variants and compares them to non-spatial RF, universal kriging (UK), a well-established geostatistical method, and multiple linear regression (MLR). Using nitrate concentrations in groundwater from two contrasting hydrogeological macro-regions in Germany, we assess predictive performance (mean absolute error) across varying prediction distances using spatial cross-validation.

The results show minor differences among spatial RF variants, except for the notably lower performance of Random Forest Spatial Interpolation (RFSI) at long prediction distances. Over short distances (within the practical range of spatial autocorrelation), spatial RF variants outperformed non-spatial RF and MLR. The RF-oob-OK method, which applies ordinary kriging on the out-of-bag errors, demonstrated consistently strong performance with acceptable computational efficiency. However, it did not substantially surpass UK in predictive performance.

Computationally manageable spatial RF variants, such as RF-oob-OK, represent viable alternatives to traditional geostatistical methods for spatial prediction of environmental pollutants, effectively exploiting both spatial predictors and autocorrelation.

How to cite: Frank, J., Suesse, T., Jiang, S., and Brenning, A.: Do spatial random forest variants improve the regionalization of environmental pollutants? - The case of groundwater nitrate concentration, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18245, https://doi.org/10.5194/egusphere-egu25-18245, 2025.