- 1University of Bologna, Bologna, Italy (leticia.garayromero2@unibo.it)
- 2Istituto Nazionale di Geofisica e Vulcanologia (INGV), Bologna Section, Italy
- 3Istituto Nazionale di Geofisica e Vulcanologia (INGV), Rome Section, Italy
The prediction of induced seismicity is a critical challenge for geological risk management and the safe operation of industrial facilities, such as geothermal projects. This study focuses on the Cooper Basin in Australia. We applied data science and machine learning techniques to analyze seismic time series, integrating two data sources: discrete seismological events (23,285 events) and continuous operational data sampled every 2 minutes (33,839 records).
The main objective was to develop machine learning models to predict, in future time windows of 10, 30, 60, and 90 minutes, two key variables: the number of seismic events or the maximum magnitude. The XGBoost and Random Forest algorithms were trained and compared. Model performance was evaluated using the R², RMSE, and MAE metrics, and their interpretability was analyzed using SHapley Additive exPlanations (SHAP).
The results demonstrate that both models generate predictions consistent with the observations, showing better predictive performance in the longer time windows (60 and 90 minutes). This approach provides a valuable framework for the monitoring and proactive risk assessment of geothermal operations.
How to cite: Garay Romero, L. R., Faenza, L., Garcia-Aristizabal, A., and Lombardi, A. M.: Application of Data Science and Machine Learning Techniques for the Prediction of Induced Seismicity, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12815, https://doi.org/10.5194/egusphere-egu26-12815, 2026.