Ensemble learning on the benchmark dataset for landslide susceptibility zonation in Central Italy

Héctor Aguilera; Jhonatan Steven Rivera Rivera; Carolina Guardiola-Albert; Marta Béjar-Pizarro

doi:https://doi.org/10.5194/egusphere-egu23-16251

[Back] [Session GM3.3]

EGU23-16251, updated on 09 Jan 2024

https://doi.org/10.5194/egusphere-egu23-16251

EGU General Assembly 2023

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Ensemble learning on the benchmark dataset for landslide susceptibility zonation in Central Italy

Héctor Aguilera¹, Jhonatan Steven Rivera Rivera^1,2, Carolina Guardiola-Albert¹, and Marta Béjar-Pizarro¹

Héctor Aguilera et al.

¹Spanish Geological Survey (IGME-CSIC), Madrid, Spain
²ETSI Topografía, Geodesia y Cartografía, Universidad Politécnica de Madrid, Madrid, Spain

In response to the call for collaboration, we aim to develop landslide susceptibility maps for the benchmark study area using Ensemble Machine Learning. Ensemble Learning has proven succesful for landslide susceptibility mapping in highly susceptible Asian regions of South Korea (Kaavi et al., 2018) and China (Hu et al., 2020).

The benchmark dataset provided, encompassing 7360 slope units in the central region of Italy, has 26 morphometric and thematic attributes, and two binary targets indicating the presence (1) or absence (0) of landslides. The first binary variable is balanced with respect to the number of zeros and ones (target 1) and the second in terms of the area covered by slope units labeled either with zero or one (target 2). For each of the two conditions in the dataset, we will compare the performance of individual classifiers such as logistic regression, naive bayes, decision trees, k-nearest neighbors, support vector machine, neural networks, as well as bagging (e.g., random forest) and boosting (e.g., extreme gradient boosting, CatBoost) algorithms using cross-validation. Then the best most diverse models will be selected based on typical performance metrics such as AUC and Matthews Correlation Coefficient (MCC), fine-tuned, and combined using stacking and blending Ensemble Learning techniques.

The best model will be re-trained with different configurations of training and test sets to derive a distribution of errors to add a measure of uncertainty in each slope unit of landslide susceptibility maps. Further, we will develop a landslide susceptibility index based on the results (e.g., probability distributions of the outcomes) to represent quantile-based susceptibility maps.

This work has been developed thanks to the pre-doctoral grant for the Training of Research Personnel (PRE2021-100044) funded by MCIN/AEI/10.13039/501100011033 and by "FSE invests in your future" within the framework of the SARAI project "Towards a smart exploitation of land displacement data for the prevention and mitigation of geological-geotechnical risks" PID2020-116540RB-C22 funded by MCIN/AEI/10.13039/501100011033.

How to cite: Aguilera, H., Rivera Rivera, J. S., Guardiola-Albert, C., and Béjar-Pizarro, M.: Ensemble learning on the benchmark dataset for landslide susceptibility zonation in Central Italy, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16251, https://doi.org/10.5194/egusphere-egu23-16251, 2023.