EGU25-9808, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-9808
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Tuesday, 29 Apr, 12:09–12:19 (CEST)
 
Room G1
Modeling topsoil organic carbon in proglacial areas worldwide using interpretive machine learning
Collin van Rooij1, Gerard Heuvelink1, Arnaud Temme2, Sigrid van Grinsven3, and Titia Mulder1
Collin van Rooij et al.
  • 1Wageningen University and Research, Soil Geography and Landscape, Netherlands (collinvrooij@gmail.com)
  • 2University of Colorado at Boulder, Institute of Arctic and Alpine research
  • 3Technical University of Munich, Geomorphology and Soil Science

Proglacial areas emerge where glaciers retreat as a result of climate change. These ‘natural laboratories’ act as a chronosequence due to glaciers’ steady recession, and are thus ideally suited to study soil formation. We synchronized data from several studies, resulting in 673 soil samples from 29 proglacial areas worldwide. We used Random Forests (RF) to inspect the predictive power of Machine Learning (ML) on topsoil organic carbon. We used 10-fold nested cross-validation to tune the model and to prevent overfitting. 37 different covariates were selected to serve as proxies of soil-forming factors. Among these are variables like the modeled temperature and precipitation to reflect climatic conditions, and geomorphological indices like the slope to reflect relief. These covariates were either measured in situ or, in majority, derived from globally available (satellite) data. The remotely sensed covariates were retrieved from open-source data through Google Earth Engine. We also analyzed how ML models perform when supplied with different subsets of covariates grouped by their associated soil-forming factor. Additionally, we conducted analyses where we left out whole areas or even regions to inspect the applicability of ML models on other proglacial areas worldwide.

The RF model with all covariates had an R² of 0.5, thus only weakly explaining the variation in topsoil organic carbon. The performance of the models where subsets of all covariates were used did not decline much. By employing Shapley values, an interpretive ML method, we revealed that NDVI and Age have the largest influence on topsoil OC content. However, the relations between covariates and the topsoil organic carbon remain complex, as is shown by the small differences in variable importance and changes in importance when certain variables are omitted. Site-specific Shapley values suggest differences in local and global drivers of SOC sequestration. Relief variables for example have a substantial effect when we consider individual areas, but climatic variables are more important within a global scope. Although Shapley values cannot guarantee a direct cause-and-effect relationship of soil forming factors and topsoil OC content in proglacial areas, they clarify the positive effect of using variables such as NDVI and Age within an ML framework and help to gain insight beyond prediction.

How to cite: van Rooij, C., Heuvelink, G., Temme, A., van Grinsven, S., and Mulder, T.: Modeling topsoil organic carbon in proglacial areas worldwide using interpretive machine learning, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9808, https://doi.org/10.5194/egusphere-egu25-9808, 2025.