EGU25-13938, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-13938
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Wednesday, 30 Apr, 10:45–12:30 (CEST), Display time Wednesday, 30 Apr, 08:30–12:30
 
Hall X3, X3.58
Dataset Construction for Landslide Susceptibility Mapping Using Multi-Buffer Zones, Clustering, and Stratified Sampling
Paraskevas Tsangaratos1, Aikaterini-Alexandra Chrysafi2, Ploutarxos Tzampoglou3, Aristodemos Anastasiades4, Elena Valari5, Vasilis Giannoglou6, and Dimitrios Loukidis7
Paraskevas Tsangaratos et al.
  • 1National Technical University of Athens, NTUA, School of Mining and Metallurgical Engineering, Section of Geological Science, ATHENS, Greece (ptsag@metal.ntua.gr))
  • 2National Technical University of Athens, NTUA, School of Mining and Metallurgical Engineering, Section of Geological Science, ATHENS, Greece (alexchrysafi@mail.ntua.gr)
  • 3Department of Civil & Environmental Engineering, University of Cyprus, 1678 Nicosia, Cyprus (tzampogloup@gmail.com)
  • 4Georama Ltd, 2021 Nicosia, Cyprus (a.anastasiades@georama.com.cy)
  • 5GeoImaging Ltd, 2021 Nicosia, Cyprus (elenvalarigeo@gmail.com)
  • 6GeoImaging Ltd, 2021 Nicosia, Cyprus (vasilis@geoimaging.com.cy)
  • 7Department of Civil & Environmental Engineering, University of Cyprus, 1678 Nicosia, Cyprus (loukidis.dimitrios@ucy.ac.cy)

Landslide susceptibility mapping is a vital tool for identifying areas vulnerable to slope instability and mitigating related hazards. A critical challenge in this process is constructing a robust, diverse, and balanced training dataset that accurately distinguishes landslide-prone areas from stable regions. This study proposes a methodology that integrates multi-buffer zoning, clustering-based sampling, and stratified sampling to enhance predictive accuracy and dataset representativeness.

The study was conducted in the Paphos district of Cyprus, an area of 552 km² that has experienced over 1,800 recorded landslides. The region’s geomorphological complexity, shaped by diverse topographic, geological, hydrological, and land-use conditions, makes it an ideal setting for advancing landslide susceptibility mapping techniques. A comprehensive dataset incorporating key environmental variables—such as slope, elevation, curvature, lithology, proximity to faults, and land cover—was compiled for analysis.

To develop the training dataset, documented landslide points were paired with non-landslide points generated from three spatial buffer zones: 250 m, 500 m, and 750 m around landslide sites. To further improve data diversity, clustering-based sampling grouped data points based on geomorphological and environmental similarities, while stratified sampling ensured proportional representation of critical variables in the dataset.

Three machine learning models—Logistic Regression (LR), Random Forest (RF), and XGBoost—were employed to evaluate the predictive performance of datasets constructed using individual buffer zones, clustering, and stratification techniques. Model performance was assessed using metrics such as Accuracy, F1 Score, Cohen’s Kappa, and Area Under the Curve (AUC) to determine the effectiveness of each dataset.

The results revealed clear distinctions between datasets. The 750 m buffer dataset outperformed the others, with XGBoost achieving an Accuracy of 93.92%, F1 Score of 93.86%, Cohen’s Kappa of 87.84%, and AUC of 98.36%. This dataset effectively captured stable environmental conditions, improving model robustness and generalizability. The 500 m buffer dataset also performed well, with XGBoost achieving an Accuracy of 92.36% and an AUC of 97.66%, while the 250 m buffer dataset, exhibited slightly lower performance, with XGBoost achieving an Accuracy of 89.36% and an AUC of 95.77%.

The clustering-based sampling approach also demonstrated strong results, with RF achieving an Accuracy of 92.44% and an AUC of 97.19%, suggesting that grouping data points based on shared characteristics enhances model precision. Finally, the combined dataset, which integrated clustering-based and stratified sampling, yielded robust results, with XGBoost achieving an Accuracy of 93.74%, Cohen’s Kappa of 85.99%, and AUC of 97.99%.

In conclusion, the proposed approach demonstrates the value of integrating multi-buffer zoning, clustering, and stratified sampling into susceptibility mapping frameworks. This study not only advances our understanding of landslide processes in the Paphos district but also provides a scalable, reliable methodology for landslide risk assessment in other regions, contributing to more resilient landscapes and communities.

This research was funded by the European Commission, project reference: ENTERPRISES/0223/Sub-Call1/0229

How to cite: Tsangaratos, P., Chrysafi, A.-A., Tzampoglou, P., Anastasiades, A., Valari, E., Giannoglou, V., and Loukidis, D.: Dataset Construction for Landslide Susceptibility Mapping Using Multi-Buffer Zones, Clustering, and Stratified Sampling, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13938, https://doi.org/10.5194/egusphere-egu25-13938, 2025.