- 1Earth and environmental science, University of Bari, Bari, Italy (domenico.capolongo@uniba.it)
- 2Margosa Environmental Solutions Ltd., Brandon House, 1st Floor, 90 The Broadway, Chesham, HP5 1EG, UK
Large scale landslide susceptibility modelling (LSM) has rapidly evolved with the availability of large landslide inventories and high-resolution global environmental datasets. However, two fundamental issues continue to undermine the reliability and interpretability of such models: spatial autocorrelation (SAC) in landslide occurrences and strong feature redundancy (FR) among terrain-derived and environmental predictors. Although both factors are known to affect model outcomes, their relative and distinct impacts on predictive performance and model interpretability at large to global scales are still poorly disentangled. Here, we present a comparative modelling study aimed at systematically evaluating how SAC and FR differently influence large-scale landslide susceptibility models outcome. Using the UGLC (Unified Global Landslide Catalog, https://essd.copernicus.org/preprints/essd-2025-482/), which includes more than one million landslide records and an equal number of geomorphologically constrained non-landslide samples, we implement Random Forest (RF) models under three experimental settings: (i) conventional random train–test splitting, (ii) spatial k-means clustered train–test splitting to mitigate SAC-induced bias, and (iii) random train–test splitting combined with feature de-correlation to mitigate multicollinearity among 60 global geomorphological, hydrological, geological, and soil predictors. Our results show that random splitting leads to strongly optimistic performance (accuracy ≈ 0.96), dominated by spatial dependence between training and testing samples. When spatial clustering is applied, model performance decreases markedly (accuracy ≈ 0.82), exposing the true predictive capability under spatial independence. Feature de-correlation does not address SAC-related bias but produces measurable gains in robustness (accuracy ≈ 0.94), improving model interpretability across diverse climatic and geomorphological settings. These findings highlight that SAC and FR affect global LSM in fundamentally different ways. A spatial train–test splitting is indispensable for unbiased performance assessment, whereas feature de-correlation serves as a complementary strategy to enhance model stability and interpretability. This distinction is critical for the development of scientifically interpretable and operationally reliable global landslide susceptibility models.
How to cite: Capolongo, D., Mancino, S., and Amatulli, G.: Beyond optimistic accuracy: effects of spatial autocorrelation and feature redundancy in large-scale landslide susceptibility models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12758, https://doi.org/10.5194/egusphere-egu26-12758, 2026.