- BioSense institute, University of Novi Sad, Serbia (miljana.markovic@biosense.rs)
Soil Organic Carbon (SOC) stocks in natural and semi-natural ecosystems remain poorly quantified in intensively cultivated lowland regions, such as Vojvodina, Serbia, which is part of the Pannonian Basin. To address this gap, we developed a general framework for large-scale, multipurpose soil sampling. The study further investigates the potential of machine learning-based regression and classification approaches to predict SOC stocks in forest and grassland ecosystems using diverse land cover and soil-related indicators as key predictors.
Spatial clustering for the planning of soil sampling was conducted by combining information from different sources. To implement a systematic and stratified sampling scheme, we followed the LUCAS methodology. Natural and semi-natural forest and grassland areas were delineated using Copernicus LULC data. An exploratory analysis was conducted using climate variables from C3S Copernicus (2015–2024) and soil properties (soil order and type, and silt, sand, and clay proportions) to identify spatial clusters suitable for field sampling. Further, forests and grasslands were clustered separately using an unsupervised K-prototypes approach. Based on this approach, 62 representative locations were identified across forests and grasslands, from which a total of 186 soil samples were collected using composite sampling at three sites per location.
Land cover features were collected along 250 m transects at each location, and landscape heterogeneity was quantified using LUCAS-based diversity indicators derived from the same transects. For machine learning–based SOC stock prediction, these indicators were combined with soil descriptors, including soil texture, soil type, and geomorphology-based soil groups, as well as spatial cluster information for forest and grassland areas. SOC stock values were averaged per location, and forest and grassland samples were jointly used in the modeling to capture landscape heterogeneity.
Regression modeling aimed to predict continuous SOC stock values, while classification categorized SOC stock into low, medium, and high levels based on thresholds derived from K-means clustering applied to the observed SOC distribution. Among the regression models, Elastic Net achieved the highest performance, with an R² of 0.49 and an RMSE of 13.74 t ha⁻¹, indicating moderate predictive capability given the complexity of SOC stock dynamics and the limited sample size. In contrast, classification models demonstrated higher predictive reliability. Logistic Regression achieved the best performance, with an overall accuracy of 76.9% and a macro F1-score of 77.1%, suggesting that SOC stock can be more robustly distinguished across discrete classes than predicted as a continuous variable. Permutation importance analysis revealed that soil texture was the dominant predictor in both regression and classification models.
Overall, the findings highlight the combined importance of soil properties and landscape diversity indicators for SOC stock prediction in natural and semi-natural ecosystems. While continuous SOC stock prediction remains challenging, classification into discrete SOC stock classes provides higher accuracy and more stable performance, suggesting a more reliable framework for SOC stock assessment in heterogeneous landscapes. Independently, this study establishes the first SOC reference framework for natural and semi-natural ecosystems in Vojvodina, providing a conceptual basis for spatial analysis and mapping.
How to cite: Marković, M., Brdar, S., Kalkan, K., Knežević, M., and Nikolić Lugonja, T.: From Ground Truth to Regional Insights: Soil Organic Carbon Predictions in Heterogeneous Landscapes using ML and multipurpose sampling, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11030, https://doi.org/10.5194/egusphere-egu26-11030, 2026.