EGU23-1497, updated on 22 Feb 2023
https://doi.org/10.5194/egusphere-egu23-1497
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Developing parsimonious model for digital soil mapping using forward recursive feature selection

Songchao Chen1,2, Xianglin Zhang2, Jie Xue2, Nan Wang2, Yi Xiao2, Zhou Shi2, Anne Richer-de-Forges3, and Dominique Arrouays3
Songchao Chen et al.
  • 1ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou 311200, China
  • 2Institute of Applied Remote Sensing and Information Technology, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
  • 3INRAE, Unité InfoSol, Orléans 45075, France

In the context of increasing soil degradation worldwide, spatially explicit soil information is urgently needed to support decision-making for sustaining limited soil resources. Digital soil mapping (DSM) has been proven as an efficient way to deliver soil information from local to global scales. The number of environmental covariates used for DSM has rapidly increased due to the growing volume of remote sensing data, therefore variable selection is necessary to deal with multicollinearity and improve model parsimony. Compared with Boruta, recursive feature elimination (RFE), and variance inflation factor (VIF) analysis, we proposed the use of modified greedy feature selection, namely forward recursive feature selection (FRFS), for DSM regression. For this purpose, using quantile regression forest, 402 soil samples and 392 environmental covariates were used to map the spatial distribution of soil organic carbon density (SOCD) in Northeast and North China. The result showed that FRFS selected the most parsimonious model with only 9 covariates (e.g., brightness index, mean annual temperature), much lower than RFE (22 covariates), VIF (30 covariates), and Boruta (76 covariates). The repeated validation (50 times) showed that the FRFS derived model performed better than these using full covariates, Boruta, RFE and VIF. Despite the similar performance of the uncertainty estimate (PICP), the model using FRFS and RFE had the lowest global uncertainty (0.86) as indicated by the uncertainty index. In addition, FRFS had the best computation efficiency when considering the steps of variable selection and map prediction. Given these advantages over Boruta, RFE and VIF, FRFS has a high potential in fine-resolution soil mapping practices, especially for these studies at a broad scale involving heavy computation on millions or billions of pixels.

How to cite: Chen, S., Zhang, X., Xue, J., Wang, N., Xiao, Y., Shi, Z., Richer-de-Forges, A., and Arrouays, D.: Developing parsimonious model for digital soil mapping using forward recursive feature selection, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-1497, https://doi.org/10.5194/egusphere-egu23-1497, 2023.