- 1Osnabrück University, Joint Lab Artificial Intelligence and Data Science, Osnabrück, Germany
- 2Leibniz Institute for Agricultural Engineering and Bioeconomy (ATB), Department of Agromechatronics, Potsdam, Germany
- 3German Research Center for Artificial Intelligence (DFKI), Research Department Plan-Based Robot Control, Osnabrück, Germany
Digital soil mapping (DSM) relies on a broad pool of statistical methods, yet determining the optimal method for a given context remains challenging. Large benchmarking studies are needed to reveal strengths and limitations of commonly used methods. Existing DSM benchmarking studies usually rely on a single dataset with restricted access, leading to incomplete and potentially biased conclusions. To address these issues, we introduce an open-access dataset collection called Precision Liming Soil Datasets (LimeSoDa). LimeSoDa consists of 31 field- and farm-scale datasets. Each dataset has three target soil properties: soil organic matter (SOM) or -carbon (SOC), clay and pH, alongside a set of features. Features are dataset-specific and were derived from spectroscopy, proximal soil sensors and remote sensing. All datasets were processed into a tabular format and are “ready-to-go” for modeling. We demonstrated the use of LimeSoDa for benchmarking by comparing four learning algorithms: multiple linear regression (MLR), support vector regression (SVR), categorical boosting (CatBoost) and random forest (RF) on their predictive power across all datasets of LimeSoDa. The results showed that no learning algorithm was generally superior. MLR and SVR proved to be better for high-dimensional spectral datasets due to better compatibility with principal components. In contrast, CatBoost and RF had considerably stronger performances for all other datasets. These benchmarking results illustrate that the performance of a method can be very context-dependent. Therefore, LimeSoDa provides a crucial data resource for improving the development and evaluation of machine learning methods in DSM and pedoemtrics.
How to cite: Schmidinger, J., Vogel, S., and Atzmueller, M.: LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9905, https://doi.org/10.5194/egusphere-egu25-9905, 2025.