- 1Department of Earth and Space Sciences, Southern University of Science and Technology, Shenzhen, China (yijunliu0810@gmail.com)
- 2Guangdong Provincial Key Laboratory of Geophysical High-resolution Imaging Technology, Southern University of Science and Technology, Shenzhen 518055, China.
- 3Laoshan Laboratory, Qingdao, China
- 4School of Earth and Space Sciences, University of Science and Technology of China, Hefei, China
- 5Institute of Hydrogeology and Environmental Geology, Chinese Academy of Geological Sciences, Shijiazhuang, China
Continental geothermal heat flow (CGHF) is a fundamental constraint on lithospheric thermal structure, yet direct measurements remain sparse and unevenly distributed. Machine learning (ML) offers a promising approach for filling these observational gaps by capturing complex, nonlinear relationships between CGHF and multi-dimensional geophysical and geological observables.
To address these questions, we design synthetic experiments that integrate geodynamic forward modeling with ML, enabling systematic diagnosis of the primary controls on prediction accuracy. Specifically, we simulate CGHF under controlled variations in crustal radiogenic heat production (RHP) and interface geometries such as the Moho and lithosphere-asthenosphere boundary. The resulting synthetic datasets, with known ground truth, serve as training and testing grounds for Random Forest algorithms. By comparing model outputs against known solutions, we systematically isolate and quantify the influence of individual factors on ML prediction performance.
Our experiments reveal that inadequate knowledge of spatially variable crustal RHP constitutes the primary bottleneck for prediction accuracy, accounting for the persistent performance ceiling (R² ~ 0.45–0.52) observed when RHP information is unavailable. In contrast, short-wavelength interface variations unresolved by current geophysical observations exert negligible influence on model performance. Moreover, ML models exhibiting benign overfitting, which fit training data closely while maintaining generalization capability, consistently outperform their conventionally regularized counterparts, demonstrating that benign overfitting can enhance rather than impair ML performance in CGHF prediction. Importantly, despite limited RHP constraints, ML models successfully extract the deep lithospheric thermal state from available geophysical features, enabling reliable prediction of large-scale CGHF patterns.
Applying these findings to real-world prediction, we construct a new global CGHF model (0.5°×0.5°) that reproduces large-scale thermal patterns with high fidelity (R² = 0.79, MAE = 5.31 mW.m-2) while resolving plausible regional variations in areas such as Greenland and the Songliao Basin. The moderate point-wise accuracy reflects inherent data limitations, primarily the poor characterization of crustal RHP, with additional degradation from local geological processes and measurement representativeness issues. Our results highlight a pressing need for improved crustal RHP constraints and demonstrate that the synthetic experiment approach developed here provides a transferable diagnostic tool for evaluating and guiding future data-driven CGHF predictions.
How to cite: Liu, Y., Yang, T., Guo, P., Ding, M., Li, Z., and Xi, Y.: Diagnosing machine learning for continental geothermal heat flow prediction:Insights from geodynamic synthetic experiments, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7059, https://doi.org/10.5194/egusphere-egu26-7059, 2026.