Ensemble machine learning improves pedotransfer functions for predicting soil mineral associated organic carbon
- 1Institute of Applied Remote Sensing and Information Technology, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
- 2ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou 311200, China
- 3European Commission, Joint Research Centre (JRC), Ispra, Italy
- 4INRAE, Unité InfoSol, Orléans 45075, France
Soil organic carbon (SOC) sequestration is a promising natural climate solution for capturing atmospheric CO2, and it provides crucial co-benefits in improving soil functions and services at the same time. Given that SOC is not a single, uniform entity, further knowledge of SOC fractions with distinctive properties, such as particulate organic carbon (POC) and mineral associated organic carbon (MAOC), is necessary in order to fully comprehend how SOC responds to environmental changes. Despite their enormous significance, POC and MAOC information is still scarce in the soil databases, especially on a large scale. The pedotransfer function (PTF) is a useful method for estimating missing soil parameters, but its application in SOC fractions has not received much attention. We assessed the potential of MAOC prediction using machine learning-based PTF (random forest (RF), Cubist, and gradient boosted machine (GBM)) along with predictor selection methods (recursive feature elimination (RFE), and forward recursive feature selection (FRFS)) on 352 representative mineral topsoil samples (0-20 cm) from across Europe. The repeated validation (100 times) revealed that machine learning-based PTFs were capable of accurately predicting MAOC. RFE can effectively reduce the number of predictors from 21 to 12 with comparable performance to the models using all predictors. With only 6 predictors (SOC, silt + clay, nitrogen, nitrogen deposition, soil erosion, and sand), the suggested FRFS algorithm outperformed RFE and had the best model parsimony. Of the three machine learning models, Cubist performed the best when combined with FRFS. Our results also showed that, when compared to a single machine learning model, five model ensemble approaches can increase model accuracy and robustness. This study offers a valuable reference for coupling PTF and legacy soil databases, in order to improve the spatial coverage and effectiveness of SOC fraction forecasts based on machine learning.
How to cite: Xiao, Y., Xue, J., Zhang, X., Wang, N., Lugato, E., Arrouays, D., Shi, Z., and Chen, S.: Ensemble machine learning improves pedotransfer functions for predicting soil mineral associated organic carbon, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-1500, https://doi.org/10.5194/egusphere-egu23-1500, 2023.