EGU23-2591
https://doi.org/10.5194/egusphere-egu23-2591
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Data science approaches for soil carbon mapping – a call for greater transparency

Victoria Janes-Bassett1, Richard Bassett2, Jordan Phillipson3, Ross Towe4, Peter Henrys5, and Gordon Blair5
Victoria Janes-Bassett et al.
  • 1School of Environmental Sciences, University of Liverpool, Liverpool, UK
  • 2Centre for Atmospheric Science, University of Manchester, Manchester, UK
  • 3School of Computing and Communications, Lancaster University, Lancaster UK
  • 4Shell Research Ltd, London, UK
  • 5UK Centre for Ecology and Hydrology, Lancaster, UK

Soils are the largest terrestrial store of carbon, storing more carbon than the atmosphere and the biosphere combined. Soil carbon plays a key role in the delivery of a wide range of ecosystem services including climate regulation, food production, water quality and regulation and as such is often used as a proxy for ‘soil health’. International initiatives such as ‘Carbon 4 per mille’ highlight the potential for carbon sequestration in soils as a mechanism for climate mitigation, and the UK’s NetZero target depends on significant land-based carbon sequestration. Therefore, a need exists to quantify present-day soil carbon stocks at both regional and national scales to guide policy decisions and provide a baseline to enable estimates of carbon sequestration potential. 

To meet this need Digital Soil Maps (DSMs) have gained significant provenance, providing high-resolution maps through spatial extrapolation of observed data to regional, national and global scales. These maps are created by applying data-science methods to observational point data and associated covariates to create a predictive model. The model is used to extrapolate the prediction over the area for which covariate information is available. The predictive models often indicate impressively high levels of accuracy based on test/validation data. However, due to differences in both the range of data, methods and covariates used to drive predictive models, multiple DSMs created for the same areas are unlikely to be identical, which is indicative of the uncertainty associated with these mapped products. Much like with process-based models, there is a need to understand which data-science methodology is most suitable for a given research question and provide clarity on the magnitude of uncertainty associated with predictions. 

In this study, we quantify uncertainty in DSMs as a result of methodological choice; we apply several approaches (Random forest, Gaussian Process, Generalised Additive Model, Neural Network and Linear Regression) to create multiple predictive models of SOC concentration across the UK. By allowing the models to select from identical input data we provide a fair comparison of each approach through isolating uncertainty in DSMs as a result of methodological choice. In addition to accuracy assessment of each of the generated DSMs, we evaluate the suitability of each of these methods for DSM application. Most crucially, we highlight the need for caution in relation to the assumed levels of accuracy of generated DSMs when considering only standard validation statistics, and the limitations of these approaches when data has bi-modal distribution, a common feature of data that encompasses both mineral and organic soils. Whilst standard statistics evaluating the overall accuracy of the DSMs are highly significant, levels of accuracy across land use classifications vary considerably. Our study highlights the need for increased transparency in communication of uncertainty and limitations of derived map products. 

How to cite: Janes-Bassett, V., Bassett, R., Phillipson, J., Towe, R., Henrys, P., and Blair, G.: Data science approaches for soil carbon mapping – a call for greater transparency, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-2591, https://doi.org/10.5194/egusphere-egu23-2591, 2023.