- UM6P, Morocco (nabil.farah@um6p.ma)
Accurate estimation of Soil Organic Carbon (SOC) is essential for sustainable soil management and carbon stock assessment. To achieve this, the development of efficient, non-invasive methods for SOC quantification is imperative. This study leverages PRISMA hyperspectral imagery and advanced machine learning techniques to predict SOC in Moroccan cereal-based agricultural soils. To this end, a detailed data processing pipeline was implemented, including denoising, band filtering, and feature engineering techniques such as Principal Component Analysis (PCA) for dimensionality reduction, spectral index calculations (e.g., NDVI, BSI, MSI), and Recursive Feature Elimination (RFE) to identify the most informative spectral features. Additionally, Field data collection was conducted in the Ain Korma commune, Province of Meknes, where 60 soil sampling points were established. At each sampling location, a polygon encompassing four corner points, and a center was defined. Soil samples were extracted using an auger. Individual samples from the five points were combined to create a composite sample, representing the average soil characteristics of the area. The Field samples coordinates are transformed into the image coordinate reference system to enable the extraction of spectral data for corresponding pixels. The modeling process revealed significant improvements in predictive accuracy with the application of preprocessing and feature selection. Initially, the XGBoost model achieved a low coefficient of determination (R²=0.08). We believe this low R2 is most likely due to the high-dimensional hyperspectral data, redundant information, and the presence of strongly correlated spectral bands that hindered the model's ability to generalize. To overcome these limitations, we implemented an advanced preprocessing that combines removing noisy and absorption bands (e.g., water vapor), co-registering the PRISMA imagery with Sentinel-2, performing advanced denoising using Wavelet and Savitzky–Golay filtering, and conducting Principal Component Analysis (PCA) alongside the calculation of spectral indices. Following these preprocessing steps, multiple machine learning algorithms were applied to predict SOC. Among the tested models, Recursive Feature Elimination (RFE) combined with XGBoost achieved the best performance, with a coefficient of determination (R²) of 0.32 and Mean Absolute Error (MAE) of 0.35. Partial Least Squares Regression (PLSR) also performed, which attained an R² of 0.30. R² of 0.30 and an MAE of 0.34. More efforts will be deployed to explore other ways to increase the model performance. These preliminary results underscore the critical role of data preprocessing and feature selection in enhancing model performance for SOM estimation. By addressing the limitations of hyperspectral data.
How to cite: Farah, N., Laamrani, A., and Bouabid, R.: Estimation Soil Organic Carbon Using Hyperspectral Imaging and Machine Learning: A Case Study in Moroccan Agricultural Soils, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-20502, https://doi.org/10.5194/egusphere-egu25-20502, 2025.