Cross validation technique preference for landslide susceptibility zoning based on slope unit and machine learning workflow
- 1Universitas Gadjah Mada, Department of Environmental Geography, Faculty of Geography Yogyakarta, Indonesia (guruh.samodra@ugm.ac.id)
- 2Universitas Gadjah Mada, Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences
- 3Universitas Gadjah Mada, Department of Mathematics, Faculty of Mathematics and Natural Sciences
Numerous advanced techniques including machine learning models are widely used in landslide susceptibility zoning which result in very high accuracy. In some cases, very high accuracy represents an overfitting in the model, where a model adapts very well to the training data but poorly for the test or new data. Cross Validation (CV) strategies are often employed to reduce overfitting in a machine learning model. Several cross validation techniques have been developed recently as a part of machine learning workflow. However, the preference of choosing one cross validation method to another is still unclear in landslide susceptibility zoning. To illustrate this issue, the authors reproduce non CV, standard V-fold CV, and several spatial CV techniques using a benchmark dataset in Italy to train, validate and test an XgBoost model using 26 landslide controlling factors. The variation of RoC validation, RoC testing, and confusion matrix were used to detect the potency of model overfitting. The preference of using a CV technique for a benchmark data in Italy will be discussed further. The result is expected to provide guidance for choosing CV technique in landslide susceptibility zoning based on slope unit and machine learning workflow.
How to cite: Samodra, G., Wahyudi, E. E., and Susyanto, N.: Cross validation technique preference for landslide susceptibility zoning based on slope unit and machine learning workflow, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-11051, https://doi.org/10.5194/egusphere-egu23-11051, 2023.