- University of Tartu, Institute of Ecology and Earth Sciences, Department of Geography, Tartu, Estonia (alexander.kmoch@ut.ee)
Spatial autocorrelation, the relationship between nearby samples of a spatial random variable, is often overlooked in machine learning models, leading to biased results. We investigated various methods to account for, address, and integrate spatial autocorrelation for modelling and prediction of soil organic carbon (SOC) using random forest models. We created and evaluated five different RF models to incorporate spatial structure through methods like buffer distances, KNN/RFSI coordinates, GWRFR, and kriging/RFRK. These were compared against a baseline models that did not have any added spatial components. Cross-validation showed slight improvements in accuracy for models considering spatial autocorrelation, while Shapley Additive Explanations confirmed the importance of spatial variables. However, no decrease in spatial autocorrelation of residuals was observed. The raster-based models exhibited enhanced prediction detail, but high-resolution validation data availability limited thorough validation. The findings emphasize the value of incorporating spatial autocorrelation for improved SOC prediction in machine learning models. We applied the models to predict SOC for the whole of Estonia in 10m raster resolution. Computational differences provided additional insights into pragmatic choices of models.
How to cite: Kmoch, A., Choi, J., Harrison, C. T., and Uuemaa, E.: Spatial autocorrelation in machine learning for modelling soil organic carbon, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16038, https://doi.org/10.5194/egusphere-egu25-16038, 2025.