EGU22-6795
https://doi.org/10.5194/egusphere-egu22-6795
EGU General Assembly 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Effect of data splitting and selection of machine learning algorithms for landslide susceptibility mapping

Minu Treesa Abraham1, Neelima Satyam1, and Biswajeet Pradhan2
Minu Treesa Abraham et al.
  • 1Department of Civil Engineering, Indian Institute of Technology Indore, Madhya Pradesh, 453552, India
  • 2Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), School of Civil and Environmental Engineering, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, PO Box 123, Australia

Landslide susceptibility maps (LSMs) are inevitable parts of regional scale landslide forecasting models. The susceptibility maps can provide the spatial probability of occurrence of landslides and have crucial role in the development and planning activities of any region. With the wide availability of satellite-based data and advanced computational facilities, data driven LSMs are being developed for different regions across the world. Since a decade, machine learning (ML) algorithms have gained wide acceptance for developing LSMs and the performance of such maps depends highly on the quality of input data and the choice of ML algorithm. This study employs a k fold cross validation technique for evaluating the performance of five different ML models, viz., Naïve Bayes (NB), Logistic Regression (LR), Random Forest (RF), K Nearest Neighbors (KNN) and Support Vector Machines (SVM), to develop LSMs, by varying the train to test ratio. The ratio is varied by changing the number folds used for k fold cross validation from 2 to 10, and its effect on each algorithm is assessed using Receiver Operating Characteristic (ROC) curves and accuracy values. The method is tested for Wayanad district, Kerala, India, which is highly affected by landslides during monsoon. The results show that RF algorithm performs better among all the five algorithms considered, and the maximum accuracy values were obtained with the value of k as 8, for all cases. The variation between the minimum and maximum accuracy values were found to be 0.6 %, 0.74 %, 1.71 %, 1.92 % and 1.83 % for NB, LR, KNN, RF and SVM respectively.

How to cite: Abraham, M. T., Satyam, N., and Pradhan, B.: Effect of data splitting and selection of machine learning algorithms for landslide susceptibility mapping, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6795, https://doi.org/10.5194/egusphere-egu22-6795, 2022.

Corresponding displays formerly uploaded have been withdrawn.