EGU26-5306, updated on 13 Mar 2026
https://doi.org/10.5194/egusphere-egu26-5306
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Monday, 04 May, 10:45–12:30 (CEST), Display time Monday, 04 May, 08:30–12:30
 
Hall X4, X4.16
Physics-Informed Machine Learning Workflow for Free Hydrocarbon Content (S1) Prediction in Organic-Rich Shale Formation
Roufeida Bennani1 and Min Wang2
Roufeida Bennani and Min Wang
  • 1China University of Petroleum (East China), School of Geosciences, Department of Geological Resources and Geological Engineering, Qingdao, China (lb2301009@s.upc.edu.cn)
  • 2China University of Petroleum (East China), School of Geosciences, Department of Geological Resources and Geological Engineering, Qingdao, China (wangm@upc.edu.cn)

Free hydrocarbon content (S1) is a key parameter for source rock evaluation and sweet spot identification in organic-rich shale reservoirs. Accurate determination of S1 is essential for petroleum exploration; however, traditional measurements rely on expensive core acquisition and Rock-Eval pyrolysis, limiting spatial coverage and operational efficiency. Empirical and log-based models often fail to capture complex non-linear relationships between S1, petrophysical logs, and geochemical properties. This study presents an integrated, physics-informed machine learning workflow for predicting S1 from well logs, mineralogical, and geomechanical data in the upper Shahejie Formation. The dataset comprises 357 S1 core measurements matched to high-resolution well logs (gamma ray, acoustic travel time, density, neutron porosity, and resistivity) over a 799 m stratigraphic interval. To address the inherent depth mismatch between irregularly spaced cores and regularly sampled logs, a constrained nearest-neighbor depth-matching strategy was implemented and validated.  Quality control confirmed minimal bias and high precision, ensuring that observed log S1 correlations represent true petrophysical trends rather than alignment-related biases. Physics-informed feature engineering was applied to capture geological ratios, porosity interactions, and depth trends. Interaction terms, including resistivity-TOC combinations, were incorporated to reflect hydrocarbon saturation and organic matter effects. Six ML algorithms were evaluated, including tree-based ensembles, kernel-based methods, and neural networks. The gradient boosting model achieved the best performance, with a correlation coefficient of 0.92 on independent test data and an RMSE of 0.58, representing a 33% improvement over a logs-only baseline. Cross-validation based on unique S1 measurements was used to prevent data leakage and demonstrated stable generalization across the dataset. Feature importance analysis highlights the dominant contribution of physics-informed terms, confirming that physically constrained predictors outperform individual logging or geochemical parameters. The proposed workflow enables continuous S1 profiling with minimal core measurements, supporting reservoir characterization and sweet-spot identification while reducing reliance on expensive geochemical analyses. This study demonstrates how combining rigorous depth alignment, physics-guided feature engineering, and machine learning can deliver reliable continuous S1 prediction for shale energy resources.

How to cite: Bennani, R. and Wang, M.: Physics-Informed Machine Learning Workflow for Free Hydrocarbon Content (S1) Prediction in Organic-Rich Shale Formation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5306, https://doi.org/10.5194/egusphere-egu26-5306, 2026.