- 1Friedrich-Schiller-University Jena, Institute of Geography, Geographic Information Science, Jena, Germany
- 2ELLIS Unit Jena, Jena, Germany
- 3National Institute for Applied Statistics Research Australia, School of Mathematics and Applied Statistics, University of Wollongong, NSW 2522 Australia
- 4Max Planck Institute for Biogeochemistry, Department of Biogeochemical Integration, Jena, Germany
Machine learning models, particularly Random Forests (RF), are increasingly used to regionalize environmental pollutants based on point measurements. Spatial variants of RF are emerging to account for geospatial data characteristics, such as spatial autocorrelation and non-stationarity. However, systematic comparisons of these spatial RF variants remain limited.
This study evaluates seven spatial RF variants and compares them to non-spatial RF, universal kriging (UK), a well-established geostatistical method, and multiple linear regression (MLR). Using nitrate concentrations in groundwater from two contrasting hydrogeological macro-regions in Germany, we assess predictive performance (mean absolute error) across varying prediction distances using spatial cross-validation.
The results show minor differences among spatial RF variants, except for the notably lower performance of Random Forest Spatial Interpolation (RFSI) at long prediction distances. Over short distances (within the practical range of spatial autocorrelation), spatial RF variants outperformed non-spatial RF and MLR. The RF-oob-OK method, which applies ordinary kriging on the out-of-bag errors, demonstrated consistently strong performance with acceptable computational efficiency. However, it did not substantially surpass UK in predictive performance.
Computationally manageable spatial RF variants, such as RF-oob-OK, represent viable alternatives to traditional geostatistical methods for spatial prediction of environmental pollutants, effectively exploiting both spatial predictors and autocorrelation.
How to cite: Frank, J., Suesse, T., Jiang, S., and Brenning, A.: Do spatial random forest variants improve the regionalization of environmental pollutants? - The case of groundwater nitrate concentration, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18245, https://doi.org/10.5194/egusphere-egu25-18245, 2025.