Using Moran's I for assessing residual spatial autocorrelation in machine learning models&nbsp;

Jakub Nowosad; Hanna Meyer; Jonas Schmidinger

doi:https://doi.org/10.5194/egusphere-egu26-11342

[Back] [Session ESSI1.7]

EGU26-11342, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-11342

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Using Moran's I for assessing residual spatial autocorrelation in machine learning models

Jakub Nowosad^1,2, Hanna Meyer¹, and Jonas Schmidinger^3,4

Jakub Nowosad et al.

¹Institute of Landscape Ecology, University of Münster, Münster, Germany
²Institute of Geoecology and Geoinformation, Adam Mickiewicz University, Poznań, Poland
³Joint Lab Artificial Intelligence and Data Science, Osnabrück University, Osnabrück, Germany
⁴Department of Agromechatronics, Leibniz Institute for Agricultural Engineering and Bioeconomy (ATB), Potsdam, Germany

Understanding the spatial dependence of residuals is important for interpreting and diagnosing spatial machine learning models. Spatial autocorrelation in the residuals suggests that the model has not fully captured the data's spatial structure. This may imply that the model is missing crucial spatial context or interactions, and that, in effect, it is spatially biased, leading to underestimation in some areas and overestimation in others.

Moran's I is a commonly used statistic for the diagnosis of spatial autocorrelation in spatial predictions, providing a single-value quantitative measure with a straightforward interpretation. This measure quantifies the degree of spatial autocorrelation, indicating whether similar values are clustered together or dispersed across space. The information provided by Moran's I has been used in various ways in studies applying machine learning: to evaluate model performance, interpret results, understand model limitations, and compare different modeling approaches.

Unlike standard model performance metrics, such as R2 or RMSE, Moran's I depends not only on the values of residuals but also on the spatial context—especially the study area's extent, the sampling strategy used, and the specification of spatial weights. However, there is a lack of a comprehensive understanding of how these factors influence the results of Moran's I calculation in the context of spatial machine learning, and of how to best use this measure for model evaluation and comparison.

Using simulated data with controlled spatial properties, we investigated how testing set size, sampling strategy, and the specification of spatial weights influence Moran's I computed on model residuals. Our results show that Moran's I, calculated based on k-nearest neighbors approach, primarily reflects the spatial structure of values in the testing set rather than the residual autocorrelation across the full prediction domain, often underestimating fine-scale spatial patterns. These findings have various implications: weight-matrix definitions must be clearly reported, calculations on sparsely distributed or clustered samples should be avoided, Moran's I is generally not directly comparable across studies due to differences in spatial extents and sampling, and its values are inherently scale-dependent.

With this contribution, we aim to present the behavior of Moran's I calculated from residuals of spatial machine learning models under different conditions, outline best practices for selecting and reporting spatial weights, and discuss how to interpret Moran’s I.

How to cite: Nowosad, J., Meyer, H., and Schmidinger, J.: Using Moran's I for assessing residual spatial autocorrelation in machine learning models , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11342, https://doi.org/10.5194/egusphere-egu26-11342, 2026.