GM3.3 | Benchmark datasets for landslide susceptibility zonation
EDI

Landslide susceptibility, the spatial likelihood of occurrence of landslides, is the subject of countless scientific publications. They use heterogeneous data, and apply many different methods, mostly falling under the definition of statistical and/or machine learning with the common feature of considering many input variables and a single target output, denoting landslide presence. It is a classification problem: given N input variables assuming different values, each combination associated with a 0/1 possible outcome, a model should be trained on some dataset, tested, and eventually it applied to unseen data.
Relevant input data (“predictors”, “factors”, “independent variables”) is usually a mixed set of topographic, morphometric, environmental, climatic, and a landslide inventory. Choice of a specific method depends on software availability, personal background, and existence of relevant literature in the area of interest. New methods are proposed regularly and very often is difficult to judge their relative performance based with respect to existing methods.
A meaningful comparison of many different methods would require a common dataset – a benchmark - to train and test each of them in a systematic way. This is a standard procedure in machine learning science and practice, for virtually all the fields: benchmark datasets exist for medical sciences, image recognition, linguistics, and in general any classification algorithm. The “Iris dataset” is a famous example of a benchmark in classification of numerical data into three different variants of the flower Iris. This session aims at establishing one or more benchmark datasets that could be helpful in landslide susceptibility research, to compare the plethora of existing methods and new methods to come.
We propose an interactive session: the organizers will single out benchmark datasets, share them with participants at due time, prior to the conference venue. We expect abstract proposals to describe the method(s) they intend to apply, the type of data it requires, and an independent case study for which the method proved successful. Participants should be ready to disclose minimal computer code (in any major programming language) to run their method, to apply the code to the benchmark dataset prior to the conference, and present their results. We aim at collecting all of the results in a journal publication, including datasets, benchmark and computer codes in collaboration with the participants.
Download dataset at: http://dx.doi.org/10.31223/X52S9C

Public information:

Benchmark dataset described in:

http://dx.doi.org/10.31223/X52S9C

Download dataset at:

https://geomorphology.irpi.cnr.it/tools/slope-units/slope-units-map/dataset_benchmark.zip

Co-organized by ESSI1/NH3
Convener: Massimiliano Alvioli | Co-conveners: Liesbet JacobsECSECS, Marco LocheECSECS, Carlos H. Grohmann
Orals
| Tue, 25 Apr, 08:30–10:15 (CEST)
 
Room G1
Posters on site
| Attendance Tue, 25 Apr, 10:45–12:30 (CEST)
 
Hall X3
Posters virtual
| Attendance Tue, 25 Apr, 10:45–12:30 (CEST)
 
vHall SSP/GM
Orals |
Tue, 08:30
Tue, 10:45
Tue, 10:45

Benchmark dataset described in:

http://dx.doi.org/10.31223/X52S9C

Download dataset at:

https://geomorphology.irpi.cnr.it/tools/slope-units/slope-units-map/dataset_benchmark.zip

Orals: Tue, 25 Apr | Room G1

Chairpersons: Massimiliano Alvioli, Txomin Bornaetxea, Marco Loche
08:30–08:35
08:35–08:45
|
EGU23-733
|
GM3.3
|
ECS
|
On-site presentation
Flavius Sirbu

Random Forest (RF) is a classification algorithm used successfully in geomorphological and hazard mapping (Sîrbu et al., 2019). It performs a defined number of classifications, based on decision trees, on random samples with replacement, from the original training data. Because of this, the algorithm is especially robust for errors and outliers in the training data and it is also very good in producing uncertainty estimates for the variability of results on each of the classified features. Its resulting data can also be used, with different methods, to produce a ranking of the independent variables used in the classification.

The present study was performed on a given data set, in central Italy, containing 7,360 slope units covering an area of 4,095 km2. The slope units are classified twice, based on different methodologies, into units with or without landslides. Also each slope unit has assigned 26 attributes that were used as independent variables (Alvioli et al., 2022). The slope units are treated as spatially independent from each other, and have been randomly split 70%-30%, into training and validation data respectively.

The model was setup as a computer code, in the R software environment. It uses different libraries to integrate the input data, run the algorithm, run a validation and measure the performance of the model and finally produce the output data. Most of the model settings were used with their default value, with the number of classification trees (ntree) being the only important setting that was fine tuned to a value of 1501 based on different model runs.

The results of the two classifications (one for each classification of the dependent variable) are relatively similar, proving once again the robustness of the RF algorithm when it comes to minor to medium changes in the input data. The first classification had an AUC (area under the curve) value of 0.829 compared with the AUC value of 0.817 for the second classification. For each classification, a ranking of the independent variables was produce, with the standard deviation of slope being the most important predictor. Other predictors with relative high importance were elevation and curvatures.

The results show that RF is an important classifier, which can be used with relatively low custom settings and on almost any data set in order to produce a reliable susceptibility map. Its integration with the R software makes it easy to run the whole process virtually automatic. The computer code for the model will be made freely available.

How to cite: Sirbu, F.: Landslide Susceptibility Model based on Random Forest classification, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-733, https://doi.org/10.5194/egusphere-egu23-733, 2023.

08:45–08:55
|
EGU23-2283
|
GM3.3
|
ECS
|
Virtual presentation
Txomin Bornaetxea, Mina Yazdani, and Mauro Rossi

We propose the usage of LAND-SUITE software to carry out 16 landslide susceptibility models exploiting the benchmark dataset provided by the session organizers. The software allows the application of Linear Discriminat Analysis (LDA), Logistic Regression (LR) and Quadratic Discriminant Analysis (QDA) as statistical methods, together with the Combination Forecast Model (CFM), which combines the outputs of the former three methods. Each of the mentioned models has been applied considering the two provided different landslide presence variables (presence1 and presence2), resulting in 8 susceptibility maps that takes into account the complete set of explanatory variables. Then, we have taken advantage of the variables analysis outputs provided by LAND-SUITE, and the process has been repeated with a reduced set of 10 explanatory variable. The variables selection has been carried out following the principles of independence between the explanatory variables, and trying to optimize the contribution of each of them to the model performance, for which leave-one-out tests and significance p-value of the LR outputs have been consulted. Results show a slight, but generalized, improvement of the model performances when the presence2 dataset is used, against the presence1. The model performance is also maintained or very sensitively decreased when the amount of explanatory variables is reduced from 26 to 10. However, the Area Under the ROC Curve (AUC) ranges between 0.75 and 0.82 in any of the tests. In addition, 9 out of the 10 selected variables are the same for both presence1 and presence2 tests. Uncertainty associated to each of the models has been also computed by means of the bootstrap resampling method.

How to cite: Bornaetxea, T., Yazdani, M., and Rossi, M.: Application of the LAND-SUITE software with a benchmark dataset for landslide susceptibility zonation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2283, https://doi.org/10.5194/egusphere-egu23-2283, 2023.

08:55–09:05
|
EGU23-5755
|
GM3.3
|
ECS
|
On-site presentation
Marko Sinčić, Sanja Bernat Gazibara, Martin Krkač, Hrvoje Lukačić, and Snježana Mihalić Arbanas

As identified by previous work, landslides present a significant hazard in the Umbria Region, Central Italy. We present a Weight of Evidence (WoE) and Random Forest (RF) approach for deriving landslide susceptibility maps (LSMs) for the defined slope units (SU) cartographic unit. Used input data in this study includes a layer containing 7360 SU with 26 landslide conditioning factors (LCFs) and two landslide presence flags. Namely, „presence1“ (P1) and „presence2“ (P2) describe 3594 and 2271 SU as unstable, respectively. LCFs were reclassified using Natural Breaks into 10 classes, followed by testing collinearity which resulted in selecting 11 for the further analyses. Unstable SU were randomly split in two equal sets, one for deriving LSMs, and the other for validation. Using only unstable SU for WoE, the landslide dataset applied in RF included additionally an equal amount of stable SU. Stable SU were randomly selected from the area which had excluded only the previously selected unstable SU, simulating a temporal inventory for landslide validation. The latter ensured application of the model to unseen data, as well as unbiased landslide dataset for training the model. Model evaluation and LSM validation included determining Area Under the Curve (AUC) for the LSM area defined with Cumulative percentage of study area in susceptibility classes and the Cumulative percentage of landslide area in susceptibility classes. For model evaluation, 50% of unstable SU were examined, whereas to validate it, the remaining 50% of unstable SU were used. For model classification parameters, all SU were used to define Overall Accuracy (OA) and a Hit Rate and False Alarm Rate curve for which AUC was calculated. RF model performed excellent, having 86.16 and 90.00 AUC values for P1 and P2 scenarios, respectively. Significantly worse, the WoE P1 and P2 scenarios have 62.09 and 69.41 AUC values, respectively. LSM validation on unseen data goes in favor of WoE with 60.46 (P1) and 66.17 (P2) AUC values, compared to 45.06 (P1) and 56.68 (P2) AUC values for RF, indicating a random guess prediction. Considering OA and AUC as classification parameters, OA values for P1 and P2 scenarios in RF are 74.36 and 77.60 whereas AUC values are 81.65 and 84.61. Significantly less, WoE method has 66.03 and 69.14 OA values for P1 and P2 scenario, respectively. Similarly, WoE AUC values for P1 is 74.09 whereas for P2 it is 77.07. Showing better results in all four studied parameters in both methods, we point out the P2 scenario as a better option for defining landslide datasets concerning the amount of unstable and stable SU. Due to having a relatively big portion of unstable SU in the input data we argue that classification parameters should be prioritized when choosing the optimal method and scenario, as they take to consideration both unstable and stable SU for the entire study area. Based on the conducted research, we suggest using RF due to better classification performance as an approach for landslide susceptibility analyses and future zonation in the study area.

How to cite: Sinčić, M., Bernat Gazibara, S., Krkač, M., Lukačić, H., and Mihalić Arbanas, S.: A slope units based landslide susceptibility analyses using Weight of Evidence and Random Forest, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5755, https://doi.org/10.5194/egusphere-egu23-5755, 2023.

09:05–09:15
|
EGU23-6259
|
GM3.3
|
ECS
|
Virtual presentation
Haojie Wang, Limin Zhang, and Lin Wang

Rain-induced natural terrain landslides are the most frequent geo-hazard in many regions of the world. As an essential tool in addressing rising landslide challenges due to climate change, landslide susceptibility assessment has been widely investigated in Hong Kong for over twenty years. However, a public dataset for Hong Kong landslide susceptibility assessment is currently absent in the geoscience research community, which brings difficulties in establishing consistent evaluation criteria for testing any new method or theory. Thus, to facilitate the development of new statistical and/or artificial intelligence-based methods for landslides susceptibility assessment, here we compile the first version of The Hong Kong University of Science and Technology – Landslide Susceptibility Dataset (HKUST-LSD) based on multiple sources of open data. Aiming at comprehensively describing the rain-induced natural terrain landslide conditioning factors in Hong Kong, HKUST-LSD v1.0 comprises data of (a) a landslide inventory; (b) a high-resolution digital terrain model (DTM) and its topographical derivatives; (c) superficial geology, distance to faults and rivers/sea; (d) historical maximum rolling rainfall and (e) ground vegetation condition. HKUST-LSD v1.0 provides a ready-to-use dataset that includes processed landslide and non-landslide samples, together with reference codes that utilized representative machine learning techniques to assess the landslide susceptibility in Hong Kong and achieved satisfactory performance. The dataset will be updated on a regular basis to fulfil the latest research needs that might arise in the research community and support global sustainable development.

Download the dataset at: https://github.com/cehjwang/HKUST-LSD

How to cite: Wang, H., Zhang, L., and Wang, L.: HKUST-Landslide Susceptibility Dataset (HKUST-LSD): A benchmark dataset for landslide susceptibility assessment in Hong Kong, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6259, https://doi.org/10.5194/egusphere-egu23-6259, 2023.

09:15–09:25
|
EGU23-7907
|
GM3.3
|
On-site presentation
Corrado Camera and Greta Bajni

The aim of this study is to contribute to the introduction of a benchmark dataset for landslide susceptibility. The contribution consists in the application of Generalized Additive Models (GAMs) on the test area proposed by Alvioli et al. (2022), located in Central Italy (Umbria Region, 4095 km2), and over the Mountain Communities of Mont Cervin and Mont Emilius (670 km2), located in the central part of Valle d’Aosta Region. In the latter, previous studies regarding landslide susceptibility were carried out by Camera et al. (2021) and Bajni (2022).

The susceptibility analysis is based on slope units for both areas and it uses the open-source dataset available for Italy (https://geomorphology.irpi.cnr.it/tools/slope-units, Alvioli et al., 2020). For Central Italy, predictors and response variable are those made available by Alvioli et al. (2022). For consistency, for Valle d’Aosta morphometric variables were calculated from the EUDEM digital elevation model (Copernicus Land Monitoring Service, 25 m horizontal resolution), while soil-related variables – namely soil depth, soil bulk density and particle size fractions - were derived from the SoilGrid global dataset (Hengl et al. 2017). In addition, coherently with Alvioli et al. (2022), two presence/absence landslide response variables (‘1’/’0’) were defined. For the first one, ‘presence1’, a slope unit was considered impacted by landslides (‘1’) if at least an event was recorded within its limits. For the second one, ‘presence2’, a slope unit was considered impacted by landslides (‘1’) if two or more landslides occurred within its limits. For Valle d’Aosta, landslide events were accessed through the regional inventory (http://catastodissesti.partout.it/), which is updated continuously by the Regional Civil Protection Department and the Forest Corps through regular surveys or following warnings from citizens.

Two landslide susceptibility maps were calculated for each area (‘presence1’, ‘presence2’). GAMs were applied through the mgcv library of R, with and without the option of variable selection through shrinkage. In addition, predictors behavior was analyzed through the associated Component Smoothing Functions (CSF) to check for physical plausibility. Finally, to evaluate uncertainties, a non-spatial k-fold cross-validation was carried out and a model evaluation was performed based on contingency tables, area under the receiver operating characteristic curve (AUROC) and variable importance (decrease in explained variance).

By the application of the same modelling algorithm (GAM) with an input dataset derived from the same data sources, the study is expected to verify the consistency of the obtained landslide susceptibility results in terms of both model performance and main driving processes (predictors).

References

Alvioli et al., 2020. Parameter-free delineation of slope units and terrain subdivision of Italy. Geomorphology 258, 107124. https://doi.org/10.1016/j.geomorph.2020.107124

Alvioli et al., 2022. Call for collaboration: Benchmark datasets for landslide susceptibility zonation. https://doi.org/10.31223/X52S9C

Bajni, 2022. Statistical methods to assess rockfall susceotibility in an Alpine environment: a focus on climatic forcing and geomechanical variables. https://doi.org/10.13130/bajni-greta_phd2022-03-23

Camera et al., 2021. Introducing intense rainfall and snowmelt variables to implement a process-related non-stationary shallow landslide susceptibility analysis. Science of The Total Environment 147360. https://doi.org/10.1016/j.scitotenv.2021.147360

Hengl et al., 2017. SoilGrids250m: Global gridded soil information based on machine learning. PLoS one 12, e0169748. https://doi.org/10.1371/journal.pone.0169748

How to cite: Camera, C. and Bajni, G.: Comparison of the effectiveness of application of GAMs for landslide susceptibility modelling in Apennine and Alpine areas, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7907, https://doi.org/10.5194/egusphere-egu23-7907, 2023.

09:25–09:35
|
EGU23-11051
|
GM3.3
|
Virtual presentation
Guruh Samodra, Erwin Eko Wahyudi, and Nanang Susyanto

Numerous advanced techniques including machine learning models are widely used in landslide susceptibility zoning which result in very high accuracy. In some cases, very high accuracy represents an overfitting in the model, where a model adapts very well to the training data but poorly for the test or new data.  Cross Validation (CV) strategies are often employed to reduce overfitting in a machine learning model. Several cross validation techniques have been developed recently as a part of machine learning workflow.  However, the preference of choosing one cross validation method to another is still unclear in landslide susceptibility zoning. To illustrate this issue, the authors reproduce non CV, standard V-fold CV, and several spatial CV techniques using a benchmark dataset in Italy to train, validate and test an XgBoost model using 26 landslide controlling factors. The variation of RoC validation, RoC testing, and confusion matrix were used to detect the potency of model overfitting. The preference of using a CV technique for a benchmark data in Italy will be discussed further. The result is expected to provide guidance for choosing CV technique in landslide susceptibility zoning based on slope unit and machine learning workflow.

How to cite: Samodra, G., Wahyudi, E. E., and Susyanto, N.: Cross validation technique preference for landslide susceptibility zoning based on slope unit and machine learning workflow, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11051, https://doi.org/10.5194/egusphere-egu23-11051, 2023.

09:35–09:45
|
EGU23-11586
|
GM3.3
|
On-site presentation
Benjamin Mirus and Jacob Woodard

Bayesian logistic regression with vague priors and optimized XGBoost models are two contrasting and commonly used approaches for modeling landslide susceptibility. Logistic regression calculates the log odds of a binary outcome (i.e., landslide or no landslide) given some predictor data (e.g., slope, elevation, and geology) that describes the terrain of each mapping unit used to divide the terrain for susceptibility evaluation. The Bayesian implementation incorporates uncertainty into the model by using probability distributions of the model parameters. Weakly informative priors ensure that the likelihood function (i.e., observational data) dominates posterior distributions, which can be estimated using the statistical software Stan. Like logistic regression, the gradient boosting decision tree machine learning algorithm XGBoost requires the predictor data of each mapping unit to output a probability of an event. Decision trees are a non-parametric learning tool that uses a set of if-then-else decision rules to predict the expected model outcome. Gradient boosting is a method of sequentially adding more decision trees to improve the model output until the lowest model residual levels are reached while penalizing for the level of complexity added to the model. We optimize the model parameters using a Bayesian cross-validation procedure on a portion of the training data. To obtain distributions of the level of susceptibility from XGBoost, a 10-fold cross-validation procedure with ten iterations is implemented. Evaluation of both Bayesian logistic regression and XGBoost algorithms is performed using the area under the curve of the receiver operator characteristics and the Brier score, but any other common metric for evaluation is possible. Model development and evaluation is carried out through the computational environment R. These methods have been applied with success to many diverse regions of the United States and would benefit from testing with the benchmark datasets proposed by the conveners.

How to cite: Mirus, B. and Woodard, J.: Bayesian logistic regression and optimized XGBoost models for landslide susceptibility assessment, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11586, https://doi.org/10.5194/egusphere-egu23-11586, 2023.

09:45–09:55
|
EGU23-12943
|
GM3.3
|
ECS
|
On-site presentation
Mateo Moreno and Stefan Steger

Grid cells (GC) and slope units (SU) are the most common mapping units in landslide susceptibility modeling. SU-based models have recently gained popularity in the field because of the availability of user-friendly software and certain advantages over GC approaches. For example, SUs are often described as more geomorphologically meaningful, less sensitive to positionally inaccurate landslide data and more flexible in representing specific variables (e.g., binary vs. count responses). In contrast to GCs, SU sizes can vary considerably within a study area. Spatially varying mapping unit sizes may be accompanied by a spatially varying likelihood of a SU being affected by a landslide. We assume that larger SUs are more likely to be labeled as "landslide-affected" than smaller SUs, which are just as susceptible to landslides simply because of their larger spatial extent. In other words, the larger the area of investigation, the more likely a landslide can be found. This may have relevant effects on subsequent landslide susceptibility models, especially if certain predictor variables correlate with SU sizes.

To our knowledge, the effects of different SU sizes on landslide susceptibility models have rarely been investigated, and no approaches to explicitly consider SU size have yet been presented. In this contribution, we use Generalized Additive Mixed Models (GAMM) to confront four different strategies for dealing with spatially varying SU sizes in landslide susceptibility modeling. The analyses focus on the provided SU-based dataset related to a part of the Umbria region in Central Italy (~4,100 km²). In the first strategy, all predisposing factors, including those directly related to SU size (i.e., SU area and distance/SU area), are used for model fitting and spatial prediction. The second strategy builds upon strategy 1, but it does not consider the size of the SUs for model fitting and spatial prediction. The third strategy demonstrates the ability of SU size to discriminate SUs with landslides from those without landslides and consists of a single-variable model with the area of the SUs as its only predictor. Then, in the fourth strategy, all predictors are used for model fitting, but the effect of SU size is averaged out from the spatial prediction (i.e., the size effect is not predicted into space, but its potentially confounding effect is isolated during the model fitting).

The first tests support the assumption that larger SUs are more likely labeled as landslide-affected SUs and that associated confounding effects should be considered in landslide susceptibility modeling. We present the four strategies in terms of modeled relationships, relative variable importance, spatial prediction pattern and quantitative validation results.

How to cite: Moreno, M. and Steger, S.: Slope unit size matters - why should the areal extent of slope units be considered in data-driven landslide susceptibility models?, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12943, https://doi.org/10.5194/egusphere-egu23-12943, 2023.

09:55–10:05
|
EGU23-13362
|
GM3.3
|
On-site presentation
Jewgenij Torizin and Nick Schüßler

In the presented study, we investigate the possibilities of performing tasks related to landslide susceptibility assessment (LSA) on the provided benchmark dataset. The slope unit-based dataset consists of aggregated predisposing factors and two label sets. Although initially introduced as a dataset for binary classification tasks, it is also suitable for zoning and regression analysis in combination with the underlying landslide inventory. Zoning ranks slope units to delineate the study area in susceptibility zones. In the regression analysis, we try to predict a numeric target value (e.g.,  landslide count) by the slope unit's attributes.

We explored the benchmark dataset using bivariate and multivariate statistical visualization techniques to understand the data relations better. We found the dataset at this stage insufficient for achieving a well-explainable high-performance classification using linear models. Most attributes are not specific to linearly separate the given labels. The chosen central tendency statistics (mean and standard deviation) may not characterize the parameter distributions inside the slope unit sufficiently.

We propose a theoretical concept for zonation analysis to assess the best possible performance on the given discrete dataset using the success rate curve as the model evaluation metric. Because any applied algorithm cannot modify the geometry of the discrete slope units, the evaluation metric only depends on the relative ranking of slope units. The best performance is obtainable without computing a predictive model. For frequency-related models (weighting of factors with landslide count statistics), a simple direct computation of conditional probabilities or frequency ratio on the slope units as a ranking factor provides the best possible ranking. Combining the label and slope unit's area provides the best slope unit ranking for binary labels.

We conducted a regression and classification analysis with artificial neural networks (ANN) testing different combinations of parameters (sensitivity analysis) architectures allowing for modeling nonlinear relations. In both analyses, initial results show that a complex net architecture can boost the model fit on the training dataset by losing predictive performance on test data. Also, the dataset pre-exploration corresponds well with the sensitivity analysis with ANN. The number of parameters is reducible to few effective predictors without losing much accuracy in classification, which is poor-to-moderate depending on the utilized label set.

While slope units as an aggregation for geomorphological analyses remain undisputed, the proposed aggregation of predisposing factors in slope units at the analysis's entry point needs further discussion. Aggregating the results of a raster-based LSA to overcome deviances in landslide susceptibility patterns caused by data uncertainties or different methods could be more suitable at this point. Slope units should be analyzed with regression analysis in LSA to consider their different spatial extents during the calculation.

We provide our scripts, visualizations, and results as a Jupyter Notebook on our GitHub: https://github.com/BGR-EGHA/EGU23_GM3.3_ls_benchmark.

How to cite: Torizin, J. and Schüßler, N.: Exploring the benchmark dataset for tasks related to landslide susceptibility assessment, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13362, https://doi.org/10.5194/egusphere-egu23-13362, 2023.

10:05–10:15
|
EGU23-16251
|
GM3.3
|
On-site presentation
Héctor Aguilera, Jhonatan Steven Rivera Rivera, Carolina Guardiola-Albert, and Marta Béjar-Pizarro

In response to the call for collaboration, we aim to develop landslide susceptibility maps for the benchmark study area using Ensemble Machine Learning. Ensemble Learning has proven succesful for landslide susceptibility mapping in highly susceptible Asian regions of South Korea (Kaavi et al., 2018) and China (Hu et al., 2020).

The benchmark dataset provided, encompassing 7360 slope units in the central region of Italy, has 26 morphometric and thematic attributes, and two binary targets indicating the presence (1) or absence (0) of landslides. The first binary variable is balanced with respect to the number of zeros and ones (target 1) and the second in terms of the area covered by slope units labeled either with zero or one (target 2). For each of the two conditions in the dataset, we will compare the performance of individual classifiers such as logistic regression, naive bayes, decision trees, k-nearest neighbors, support vector machine, neural networks, as well as bagging (e.g., random forest) and boosting (e.g., extreme gradient boosting, CatBoost) algorithms using cross-validation. Then the best most diverse models will be selected based on typical performance metrics such as AUC and Matthews Correlation Coefficient (MCC), fine-tuned, and combined using stacking and blending Ensemble Learning techniques.

The best model will be re-trained with different configurations of training and test sets to derive a distribution of errors to add a measure of uncertainty in each slope unit of landslide susceptibility maps. Further, we will develop a landslide susceptibility index based on the results (e.g., probability distributions of the outcomes) to represent quantile-based susceptibility maps.

This work has been developed thanks to the pre-doctoral grant for the Training of Research Personnel (PRE2021-100044) funded by MCIN/AEI/10.13039/501100011033 and by "FSE invests in your future" within the framework of the SARAI project "Towards a smart exploitation of land displacement data for the prevention and mitigation of geological-geotechnical risks" PID2020-116540RB-C22 funded by MCIN/AEI/10.13039/501100011033.

How to cite: Aguilera, H., Rivera Rivera, J. S., Guardiola-Albert, C., and Béjar-Pizarro, M.: Ensemble learning on the benchmark dataset for landslide susceptibility zonation in Central Italy, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16251, https://doi.org/10.5194/egusphere-egu23-16251, 2023.

Posters on site: Tue, 25 Apr, 10:45–12:30 | Hall X3

Chairpersons: Marco Loche, Marco Sinčić
X3.35
|
EGU23-295
|
GM3.3
|
ECS
|
Hongzhi Cui, Marcel Hürlimann, Vicente Medina, and Jian Ji

Landslide susceptibility analysis is the necessary procedure for timely discovering and locking potential sources of slope instabilities in natural terrain areas. The infinite slope model is broadly applied for evaluating the shallow landslide susceptibility coupling the geotechnical and geological parameters with a hydrological model. Because rainfall is one of the major factors inducing landslides, the calculation of the water table and pore water pressure is an important task in our approach. To assess appropriately the most susceptible areas, we propose a new framework for regional slope stability based on probabilistic analysis by combining a hydromechanical model, which couples the Fast Shallow Landslide Assessment Model (FSLAM) and reliability method. A user-friendly software based on the open-source geographic information system (QGIS) platform called the GIS-FSLAM-FORM plugin adopting the Python programming language was designed and developed. Accounting for the potential uncertainties of geotechnical parameters (in particular effective cohesion and friction of soil or root strength), the horizontal hydraulic conductivity, as well as the soil depth. Our now approach is emphasized for its simple hydrologic model and its high computation efficiency. To consider the probabilistic information of the FSLAM incorporating the infinite slope, the first-order reliability method (FORM) is presented during the analysis although inevitably involving iterative computing. The developed plugin using physically-based modelling can directly provide several regional hazard index distribution maps, such as the factor of safety (FoS), reliability index (RI), and failure probability (Pf).

How to cite: Cui, H., Hürlimann, M., Medina, V., and Ji, J.: GIS-FSLAM-FORM: A QGIS plugin for fa t probabilistic susceptibility assessment of rainfall-induced landslides at regional scale, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-295, https://doi.org/10.5194/egusphere-egu23-295, 2023.

X3.36
|
EGU23-3566
|
GM3.3
|
ECS
|
Highlight
|
|
Marco Loche, Massimiliano Alvioli, Ivan Marchesini, and Luigi Lombardo

We develop a slope-unit based landslide susceptibility model using the benchmark dataset proposed in the session, located in Central Italy. As a result, we produce two susceptibility maps based on the two different landslide presence attribute fields included in the dataset.

The proposed dataset is a subset of a much larger one, recently used to obtain landslide susceptibility all over Italy. We further explore the differences between results obtained from the proposed dataset, and landslide susceptibility obtained at national scale. The national scale results were obtained in a Bayesian version of a binomial Generalized Additive Model (GAM) in R-INLA, an R implementation of the integrated nested Laplace approximation for approximate Bayesian inference. The method can explain the spatial distribution of landslides using a family of Bernoulli exponential functions.

This allows us to estimate fixed effects and random effects, and to assess their associated uncertainty. The residual susceptibility maps and the most common correlations permit to measure the strength and direction of the relationships between models and to capture differences in susceptibility values across the study area. On their basis, we offer a convenient approach to evaluate the similarities in case of both represented landslide distributions.

We propose this modeling comparison for any susceptibility maps to evaluate the interpretability of the covariates and performances, where a large dataset may influence the susceptibility pattern over space.

How to cite: Loche, M., Alvioli, M., Marchesini, I., and Lombardo, L.: Landslide Susceptibility within the binomial Generalized Additive Model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3566, https://doi.org/10.5194/egusphere-egu23-3566, 2023.

X3.37
|
EGU23-6053
|
GM3.3
|
ECS
|
|
Gianvito Scaringi and Marco Loche

Developments of geostatistical models in landslide susceptibility mapping often do not consider interpretability, although this element has a reasonably fundamental importance on risk assessment. Last trends in machine learning demonstrate that enhancement of performances influences the interpretability of mechanical processes in geostatistical models, in which geomorphic causation is suddenly lost.

We took the benchmark dataset in central Italy as our study case, for which a complete inventory of landslides is available. We built two landslide susceptibility models using a Generalised Additive Model (GAM) with a slope-unit partitioning of the area (~4,100 km2, comprising 7,360 slope units), and a set of 26 independent variables, with the aim of classifying the presence/absence of landslides.

We tested the capability of a binomial GAM through nonparametric smoothing functions to evaluate the interpretability of the covariates. Furthermore, we obtained satisfactory results in terms of performance with a reasonable compromise in the interpretability.

GAMs are very popular classifiers in landslides susceptibility and even though other methods yield better performance, we suggest that interpretability in geostatistical analyses should proceed in tandem with improving the models’ performances.

How to cite: Scaringi, G. and Loche, M.: Landslide Susceptibility Mapping via binomial Generalized Additive Model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6053, https://doi.org/10.5194/egusphere-egu23-6053, 2023.

X3.38
|
EGU23-9623
|
GM3.3
|
ECS
Sansar Raj Meena, Mario Floris, and Filippo Catani

Landslide inventories are quintessential for landslide susceptibility mapping, hazard modeling, and risk management. Experts and organizations all across the world have preferred manual visual interpretation of satellite and aerial imagery for decades. However, there are other issues with manual inventory, such as the subjective process of manually extracting landslide boundaries, the lack of sharing landslide polygons within the geoscientific community, and the amount of time and effort engaged in the inventory generation process by the expert interpreters. To address these challenges, a large amount of research on semi-automated and automatic mapping of landslide inventories has been conducted in recent years. The automatic development of landslide inventory using Artificial Intelligence (AI) approaches is still in its early stages, as there is currently no published study that can generate a ground truth representation of a landslide situation following a landslide-triggering event. In terms of landslide boundary delineation utilizing AI-based models, the evaluation metrics in recent research suggest a range of 50-80% of the F1-score. However, with the exception of those using model evaluation testing in the same studied area, very few studies claim to have attained more than 80% F1 score, that too at larger scales of investigation. As a result, there is currently a research gap between the generation of AI-based landslide inventory and their applicability for landslide hazard and risk assessments. There is a need to advocate for the geoscientific community to check the reliability of AI-generated landslide data in terms of their usage in the succeeding phases of landslide response and mitigation in impacted areas.

How to cite: Meena, S. R., Floris, M., and Catani, F.: Can AI-generated landslide inventories replace humans' cognitive abilities in hazard and risk scenarios?, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9623, https://doi.org/10.5194/egusphere-egu23-9623, 2023.

X3.39
|
EGU23-12820
|
GM3.3
Using PG_TRIGRS with benchmark datasets for landslide susceptibility zonation in central Italy
(withdrawn)
Diana Salciarini and Evelina Volpe

Posters virtual: Tue, 25 Apr, 10:45–12:30 | vHall SSP/GM

Chairpersons: Massimiliano Alvioli, Marco Loche
vSG.1
|
EGU23-4851
|
GM3.3
Neelima Satyam, Minu Treesa Abraham, and Kunal Gupta

The use of machine learning (ML) approaches for developing landslide susceptibility maps (LSM) has gained wide popularity in the recent past. The choice of ML algorithms, spatial resolution, the ratio of train-to-test data, and the landslide conditioning factors are some of the crucial factors that decide the performance of the developed LSM. However, there are no formal guidelines on the selection of any of these factors, as the choice highly depends upon the study area. In most cases, site-specific comparative analysis are required to find the best-suited combination. Two case studies were conducted for parts of the Western Ghats in India to develop pixel-based LSM for Idukki and Wayanad districts. Five different ML algorithms, two different spatial resolutions, multiple train-to-test ratios and two different types of landslide inventory data were used for developing the best-suited LSM. After detailed analysis, it was observed that the random forest (RF) algorithm has resulted in the best-performing LSM for both regions. The effects of spatial resolution and data splitting were found to be different for different algorithms, and among all the factors considered, data splitting is found to be the least influencing factor. 

How to cite: Satyam, N., Abraham, M. T., and Gupta, K.: Resolution of data, type of inventory and data splitting in machine learning-based landslide susceptibility mapping, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4851, https://doi.org/10.5194/egusphere-egu23-4851, 2023.

vSG.2
|
EGU23-6937
|
GM3.3
|
ECS
Neha Gupta, Debi Prasanna Kanungo, and Josodhir Das

High-magnitude earthquakes are often in seismic zones that initiate the cascading chain of hazards such as co-seismic landslides, soil liquefaction, snow avalanche, surface faulting, devastating rock avalanches, and ground shaking. In the present study, a co-seismic landslide susceptibility analysis was executed for the Bhagirathi valley of Uttarakhand Himalayan region using machine learning techniques based on the slope unit-based method. The study area falls in seismic zone IV, rocks along the fault zone are fragile, and this area is very active seismically. This region has previously experienced Uttarkashi earthquake (1991) of magnitude 6.6. Assessment of seismic induced landslide is considered a complex process, as it considers both static parameters (causative factors) and dynamic parameters (triggering factor) in the form of ground motion shaking effects. In this study, the co-seismic landslide susceptibility maps using the machine learning approach Extreme Gradient Boosting (XgBoost) and Naïve Bayes (NB) techniques have been carried out at Slope Unit-based mapping level. The landslide inventory with 3,000 delineated polygons has been classified into training (80%) and testing (20%) data to calibrate and authenticate the models. For this purpose, static causative factors have been considered, such as slope, aspect, curvature, lineament buffer, drainage buffer, geology, topographic wetness index, and normalized difference vegetation index (NDVI), these parameters have been generated using the CartoDEM and satellite data. Triggering factors Arias Intensity (AI) has been considered for ground motion shaking as a dynamic factor for co-seismic landslides susceptibility mapping. Arias Intensity was prepared using the classical Cornell approach by considering the earthquake catalogue between the years 1700 and 2022. Finally, XgBoost and NB techniques have been used to compute static landslide susceptibility mapping and dynamic co-seismic landslide susceptibility map for a 475-year return period. XgBoost methods at the slope unit level predicted better results. These results were validated using the seismic relative index (SRI) and landslide density method. The prepared map can be effectively helpful for local and regional planning.

 

Keywords: Co-seismic landslide, Slope Unit, Landslide mapping, Machine learning.

 

How to cite: Gupta, N., Kanungo, D. P., and Das, J.: Co-seismic landslide susceptibility analysis for the Bhagirathi valley of Uttarakhand Himalayan region using machine learning algorithms based on Slope unit techniques, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6937, https://doi.org/10.5194/egusphere-egu23-6937, 2023.

vSG.3
|
EGU23-9988
|
GM3.3
Paraskevas Tsangaratos, Ioanna Ilia, and Aikaterini-Alexandra Chrysafi

Landslide phenomena are considered as one of the most significant geohazards with a great impact on the man-made and natural environment. If one search the scientific literature, the most studied topic in landslide assessments is the identification of areas that potentially may exhibit instability issues by modelling the influence of landslide-related variables with methods and techniques from the domain of knowledge and data-driven approaches. This is not an easy task, since the complexity, and in most cases the unknown processes that are responsible for the evolution of landslide phenomena triggered either of natural or man-made activities, influence their performance. Landslide susceptibility assessments, which models the spatial component of the evolution of landslides are the most reliable investigation tool capable of predicting the spatial dimension of the phenomenon with high accuracy. During the past two decades, artificial intelligence methods and specifically machine learning algorithms have dominated landslide susceptibility assessments, as the main sophisticated methods of analysis. Fuzzy logic algorithms, decision trees, artificial neural networks, ensemble methods and evolutionary population-based algorithms were among the most advanced methods that proved to be reliable and accurate.

In this context, the main objective of the present study was to compare the performance of various Machine Learning models (MLm) in landslide susceptibility assessments. Concerning the followed methodology, it could be separated into a five-phase procedure: (i) creating the inventory map, (ii) selecting, classifying, and weighting the landslide-related variables, (iii) performing a multicollinearity, an importance analysis (iv) implementing the developed methodology and testing the produced models, and (v) comparing the predictive performance of the various models. The computational process was carried out coding in R and Python language, whereas ArcGIS 10.5 was used for compiling the data and producing the landslide susceptibility maps.

In more details, Logistic Regression, Support Vector Machines, Random Forest, and Artificial Neural Network were implemented, and their predictive performance were compared. The efficiency of the MLM was estimated for an area of northwestern Peloponnese region, Greece, an area characterized by the presence of numerous landslide phenomena. Twelve landslide-related variables, elevation, slope angle, aspect, plan and profile curvature, topographic wetness index, lithology, silt, sand and clay content, distance to faults, distance to river network and 128 landslide locations, were used to produce the training and test datasets. The Certainty Factor was implemented to calculate the correlation among the landslide-related variables and to assign to each variable class a weight value. Multi-collinearity analysis was used to estimate the existence of collinearity among the landslide related variables. Learning Vector Quantization (LVQ) was used for ranking features by importance, whereas the evaluation process involved estimating the predictive ability of the MLm via the classification accuracy, the sensitivity, the specificity and the area under the success and predictive rate curves (AUC). Overall, the outcome of the study indicates that all MLm provided high accurate results with the Artificial Neural Network approach being the most accurate followed by Random Forest, Support Vector Machines and Logistic Regression. 

How to cite: Tsangaratos, P., Ilia, I., and Chrysafi, A.-A.: Comparing the performance of Machine Learning Methods in landslide susceptibility modelling, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9988, https://doi.org/10.5194/egusphere-egu23-9988, 2023.