EGU23-16096
https://doi.org/10.5194/egusphere-egu23-16096
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Limitations of machine learning in a spatial context

Jens Heinke, Christoph Müller, and Dieter Gerten
Jens Heinke et al.
  • Potsdam Institute for Climate Impact Research, Potsdam, Germany

Machine learning algorithms have become popular tools for the analysis of spatial data. However, a number of studies have demonstrated that the application of machine learning algorithms in a spatial context has limitations. New geographic locations may lie outside of the data range for which the model was trained, and estimates of model performance may be too optimistic, when spatial autocorrelation of geographic data is not properly accounted for in cross-validation. We here use artificially created spatial data fields to conduct a series of experiments to further investigate the potential pitfalls of random forest regression applied to spatial data. We provide new insights on previously reported limitations and identify further limitations. We demonstrate that the same mechanism that leads to overoptimistic estimates of model performance (when based on ordinary random k-fold cross-validation) can also lead to a deterioration of model performance. When covariates contain sufficient information to deduce spatial coordinates, the model can reproduce any spatial pattern in the training data even if it is entirely or partly unrelated to the covariates. The presence of spatially correlated residuals in the training data changes how the model utilizes the information of the covariates and impedes the identification of the actual relationship between covariates and response. This reduces model performance when the model is applied to data with a different spatial structure. Under such conditions, machine learning methods that are sufficiently flexible to fit to autocorrelated residuals (such as random forest) may not be an optimal choice. Better models may be obtained using less flexible but more transparent approaches such as generalized linear models or additive models.

How to cite: Heinke, J., Müller, C., and Gerten, D.: Limitations of machine learning in a spatial context, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-16096, https://doi.org/10.5194/egusphere-egu23-16096, 2023.