EGU25-12226, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-12226
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Tuesday, 29 Apr, 09:55–10:05 (CEST)
 
Room -2.20
Leveraging NIR Spectroscopy and Machine Learning Models for Estimating Organic Carbon Concentration in Agricultural Soils
Claudia Gambale, Anastasia Shchegolikhina, Andrea Lazzari, Andrea Gasparini, Dario Benedini, Alessandro Buccioli, and Giovanni Cabassi
Claudia Gambale et al.
  • Council for Agricultural Research and Economics, Research Centre Animal Production and Aquaculture (CREA-ZA), Lodi (LO), Italy

Precision agriculture is defined as a sophisticated and sustainable approach to soil management, optimizing resource use while minimizing environmental impact. A key challenge in this context is the estimation of soil organic carbon (SOC), a critical parameter for assessing soil health, fertility, and carbon sequestration potential. However, traditional SOC analysis methods, while accurate, are often time-consuming and cost-prohibitive, thereby limiting scalability. Consequently, the development of rapid methods such as near-infrared spectroscopy (NIR), when combined with machine learning-based predictive models of SOC, is a promising solution for expeditious and low-cost mapping techniques. This study explores its application on a field scale to develop maps for organic carbon levels.

The calibration dataset comprises 460 soil samples obtained from northern Italy, while the validation dataset consists of 75 samples from two fields located in the Po Valley. Soil samples were collected according to a regular 50-meter grid at a depth of 30 centimeters. To map SOC concentration, these samples were analyzed by an external laboratory employing standard wet reference methods. The NIR analysis was conducted using the NIRFlex N500 (Buchi) in diffuse reflectance mode over the 1000-2500 nm range.

Different calibration models were created using three machine learning techniques: i) Locally Weighted Regression (LWR) configured with 30 local point selected from the local PLS space using 4 latent variables; ii) Gradient Boosted Tree Regression (XGBoost) with max_depth set to 4 and num_round to 300 to prevent overfitting; iii) Deep Learning Artificial Neural Network (ANNDL), implemented using TensorFlow as the framework and Rmsprop as the optimizer. The network was designed as a multilayer densely connected architecture. The spectral data were first compressed using PLS (8 latent variables) to improve training performance.

The NIR-based estimation for organic carbon content was evaluated using Root Mean Square Error (RMSE) and BIAS metrics. The machine learning calibration models showed the following results: i) RMSECV=3.53 and BIAS (cal)=0.04; ii) RMSECV=3.04 and BIAS (cal)=0.06; iii) RMSECV=3.32 and BIAS (cal)=-0.07. Moreover, the prediction demonstrated these metrics for the first field: i) RMSEP=2.14, BIAS (pred)=0.05; ii) RMSEP=2.77, BIAS (pred)=0.90; iii) RMSEP= 2.40, BIAS (pred)= 0.64. Instead, for the second field, the following predictions were made: i) RMSEP=3.05, BIAS (pred)=0.31; ii) RMSEP=2.52, BIAS (pred)=0.72; iii) RMSEP=2.05, BIAS (pred)=0.67.

To compare the practical efficiency of NIR models with reference methods, concentration maps of SOC were created by dividing them into two homogeneous zones (high and low SOC). Subsequently, the maps obtained using each NIR model were overlapped with that obtained using the reference method to calculate the percentage of consensus area. If the overlap exceeded 70%, the model was considered suitable for precision agriculture purposes.

How to cite: Gambale, C., Shchegolikhina, A., Lazzari, A., Gasparini, A., Benedini, D., Buccioli, A., and Cabassi, G.: Leveraging NIR Spectroscopy and Machine Learning Models for Estimating Organic Carbon Concentration in Agricultural Soils, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12226, https://doi.org/10.5194/egusphere-egu25-12226, 2025.