EGU23-13362, updated on 09 Jan 2024
https://doi.org/10.5194/egusphere-egu23-13362
EGU General Assembly 2023
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Exploring the benchmark dataset for tasks related to landslide susceptibility assessment

Jewgenij Torizin and Nick Schüßler
Jewgenij Torizin and Nick Schüßler
  • Institute for Geosciences and Natural Resources (BGR), Hannover, Germany (jewgenij.torizin@bgr.de)

In the presented study, we investigate the possibilities of performing tasks related to landslide susceptibility assessment (LSA) on the provided benchmark dataset. The slope unit-based dataset consists of aggregated predisposing factors and two label sets. Although initially introduced as a dataset for binary classification tasks, it is also suitable for zoning and regression analysis in combination with the underlying landslide inventory. Zoning ranks slope units to delineate the study area in susceptibility zones. In the regression analysis, we try to predict a numeric target value (e.g.,  landslide count) by the slope unit's attributes.

We explored the benchmark dataset using bivariate and multivariate statistical visualization techniques to understand the data relations better. We found the dataset at this stage insufficient for achieving a well-explainable high-performance classification using linear models. Most attributes are not specific to linearly separate the given labels. The chosen central tendency statistics (mean and standard deviation) may not characterize the parameter distributions inside the slope unit sufficiently.

We propose a theoretical concept for zonation analysis to assess the best possible performance on the given discrete dataset using the success rate curve as the model evaluation metric. Because any applied algorithm cannot modify the geometry of the discrete slope units, the evaluation metric only depends on the relative ranking of slope units. The best performance is obtainable without computing a predictive model. For frequency-related models (weighting of factors with landslide count statistics), a simple direct computation of conditional probabilities or frequency ratio on the slope units as a ranking factor provides the best possible ranking. Combining the label and slope unit's area provides the best slope unit ranking for binary labels.

We conducted a regression and classification analysis with artificial neural networks (ANN) testing different combinations of parameters (sensitivity analysis) architectures allowing for modeling nonlinear relations. In both analyses, initial results show that a complex net architecture can boost the model fit on the training dataset by losing predictive performance on test data. Also, the dataset pre-exploration corresponds well with the sensitivity analysis with ANN. The number of parameters is reducible to few effective predictors without losing much accuracy in classification, which is poor-to-moderate depending on the utilized label set.

While slope units as an aggregation for geomorphological analyses remain undisputed, the proposed aggregation of predisposing factors in slope units at the analysis's entry point needs further discussion. Aggregating the results of a raster-based LSA to overcome deviances in landslide susceptibility patterns caused by data uncertainties or different methods could be more suitable at this point. Slope units should be analyzed with regression analysis in LSA to consider their different spatial extents during the calculation.

We provide our scripts, visualizations, and results as a Jupyter Notebook on our GitHub: https://github.com/BGR-EGHA/EGU23_GM3.3_ls_benchmark.

How to cite: Torizin, J. and Schüßler, N.: Exploring the benchmark dataset for tasks related to landslide susceptibility assessment, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13362, https://doi.org/10.5194/egusphere-egu23-13362, 2023.