EGU25-10556, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-10556
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Friday, 02 May, 14:00–15:45 (CEST), Display time Friday, 02 May, 14:00–18:00
 
Hall X3, X3.9
Machine Learning for High-Accuracy Co-Seismic Landslide Risk Prediction Using Multi-Parametric Data: A Case Study of M7.2 Hualien Earthquake
Yu Hsuan Ou Yang1, Wei An Chao1,2, and Che Ming Yang3
Yu Hsuan Ou Yang et al.
  • 1Department of Civil Engineering, National Yang Ming Chiao Tung University, Hsinchu City 300093, Taiwan
  • 2Disaster Prevention and Water Environment Research Center, National Yang- Ming Chiao Tung University, Hsinchu City 300093, Taiwan
  • 3Department of Civil and Disaster Prevention Engineering, National United University, Miaoli County 360302, Taiwan

Taiwan, situated at the junction of the Ryukyu Arc and the Philippine Arc, is prone to frequent seismic activities due to its position at the boundary of tectonic plates. Earthquake-induced landslides, therefore, are one of the most common geological hazards. For disaster mitigation, it is crucial to accurately predict the spatial distribution of such landslides after earthquake occurrence. This study revolves around assessing the landslide risks triggered by the April 3rd, 2024, Hualien earthquake, which caused tremendous damage and claimed 18 lives, using multiple machine learning models, including Random Forest (RF), Support Vector Machines (SVM), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN). However, Logistic Regression (LR) was undiscussed in this study due to its disaster prediction limitations. While LR is advantageous when handling small datasets with limited independent variables, it faces significant drawbacks in high-dimensional and multi-variable scenarios. Moreover, the simplistic structure of LR tends to result in underfitting, causing inferior predictive performance. Furthermore, when dealing with large-scale data, the process becomes computationally intensive for LR. In contrast, machine learning models like RF, SVM, and GBM, along with ensemble techniques, are better suited for addressing the complexity of earthquake-induced landslide prediction.

The models were trained using a dataset comprising 3191 data points, including various topographic, geological, and seismic variables such as slope-related factors, curvature, elevation, aspect, lithology, peak ground acceleration (PGA), peak ground velocity (PGV), and distances to nearby faults and rivers. The dataset was labeled into two categories: coseismic landslide (CL) data labeled as 1 and non-coseismic landslide (NCL) data labeled as 0. To train and evaluate the models, the dataset was divided into two subsets: 70% was used as the training set to build and fine-tune the models, while the remaining served as the test set to assess their predictive performance. The confusion matrices of the four models were the basis for comparing their performance. All models’ accuracy exceeds 0.95. Among them, the SVM model reached the highest at 0.9822, followed by GBM (0.9702), RF (0.9697), and KNN (0.9530). The greater performance of SVM can be attributed to its ability to handle high-dimensional and nonlinear data more effectively, using kernel functions to transform the feature space and maximize the margin between classes, enhancing its classification precision and generalization capability.

To further enhance prediction reliability, an ensemble model was developed by integrating the RF, SVM, and GBM models, while the KNN model, showing the lowest accuracy, was excluded, ensuring the number of the models was odd. The final prediction of the ensemble model was voted by the outcome of the three models, substantially reducing prediction errors.

Compared to logistic regression models, the ensemble approach is more dependable. While logistic regression struggles with high-dimensional, non-linear, and strongly correlated geophysical variables, the ensemble model formed by three machine learning models (RF, SVM, and GBM) combines their strengths to tackle these challenges. By leveraging the models’ diversity, the ensemble reduces overfitting and enhances the robustness of predictions, highlighting the ensemble model’s capability in addressing the complexities of coseismic landslide prediction.

How to cite: Ou Yang, Y. H., Chao, W. A., and Yang, C. M.: Machine Learning for High-Accuracy Co-Seismic Landslide Risk Prediction Using Multi-Parametric Data: A Case Study of M7.2 Hualien Earthquake, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10556, https://doi.org/10.5194/egusphere-egu25-10556, 2025.