Machine Learning-based Site Classification System for Earthquake-Induced Multi-Hazard in South Korea
- Korea Institute of Geoscience and Mineral Resource, Korea, Republic of (adoogen@kigam.re.kr)
Earthquake-induced land deformation and structure failure are more severe over soft soils than over firm soils and rocks owing to the seismic site effect and liquefaction. The site-specific seismic site effect related to the amplification of ground motion, liquefaction, and landslide has spatial uncertainty depending on the local subsurface, surface geological, and topographic conditions. When the 2017 Pohang earthquake (M 5.4), South Korea’s second strongest earthquake in decades, occurred, the severe damages influenced by variable site response and vulnerability indicators were observed focusing on the basin or basin-edge region deposited unconsolidated Quaternary sediments. Thus, nationwide site characterization is essential considering empirical correlations with geotechnical site response and hazard parameters and surface proxies. Furthermore, in case of so many variables and tenuously related correlations, machine learning classification models can prove to be very precise than the parametric methods. This study established a multivariate seismic site classification system using the machine learning technique based on the geospatial big data platform.
The supervised machine learning classification techniques and more specifically, random forest, support vector machine (SVM), and artificial neural network (ANN) algorithms have been adopted. Supervised machine learning algorithms analyze a set of labeled training data consisting of a group of input data and desired output values. They produce an inferred function that can be used for predictions from given input data. To optimize the classification criteria by considering the geotechnical uncertainty and local site effects, the training datasets applying principal component analysis (PCA) were verified with k-fold cross-validation. Moreover, the optimized training algorithm, proved by loss estimators (receiver operating characteristic curve (ROC), the area under the ROC curve (AUC)) based on confusion matrix, was selected.
For the southeastern region in South Korea, the boring log information (strata, standard penetration test, etc.), geological map (1:50k scale), digital terrain model (having 5 m × 5 m), soil map (1:250k scale) were collected and constructed as geospatial big data. Preliminarily, to build spatially coincided datasets with geotechnical response parameters and surface proxies, the mesh-type geospatial information was built by advanced geostatistical interpolation and simulation methods.
Site classification systems use seismic hazard parameters related to the geotechnical characteristics of the study area as the classification criteria. The current site classification systems in South Korea and the United States recommend Vs30, which is the average shear wave velocity (Vs) up to 30 m underground. This criterion uses only the dynamic characteristics of the site without considering its geometric distribution characteristics. Thus, the geospatial information included the geo-layer thickness, surface proxies (elevation, slope, geological category, soil category), and Vs30. For the liquefaction and landslide hazard estimation, the liquefaction vulnerability indexes (i.e., liquefaction potential or severity index) and landslide vulnerability indexes (i.e., a factor of safety or displacement) were also trained as input features into the classifier modeling. Finally, the composite status against seismic site effect, liquefaction, and landslide was predicted as hazard class (I.e., safe, slight-, moderate-, extreme-failure) based on the best-fitting classifier.
How to cite: Kim, H.: Machine Learning-based Site Classification System for Earthquake-Induced Multi-Hazard in South Korea, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-4757, https://doi.org/10.5194/egusphere-egu23-4757, 2023.