EGU2020-12265, updated on 12 Jun 2020
https://doi.org/10.5194/egusphere-egu2020-12265
EGU General Assembly 2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Landslide Susceptibility Assessment Considering Imbalanced Data: Comparison of Random Forest and Multi-Layer Perceptron

JungHyun Lee1, HyuckJin Park2, DongJun Lee3, and SooHyeon Lim4
JungHyun Lee et al.
  • 1Dept. of Geoinformation Engineering, Sejong University, Seoul, Republic of Korea (jhlee6086@gmail.com)
  • 2Dept. of Energy Resources and Geosystems Engineering, Sejong University, Seoul, Republic of Korea (hjpark@sejong.ac.kr)
  • 3Dept. of Geoinformation Engineering, Sejong University, Seoul, Republic of Korea (junman882@naver.com)
  • 4Dept. of Geoinformation Engineering, Sejong University, Seoul, Republic of Korea (lsooh0612@gmail.com)

The landslide prediction analyzes the various landslides related factors and their correlations physically or mathematically. Many researches used statistical methods to consider the relationships between landslide occurrence location and related factors such as topography, and geology. Existing statistical methods produces errors due to the variety and uncertainty of the input data. Recently, machine learning techniques using artificial intelligence and big data is proposed to improve the accuracy and efficiency of landslide prediction and management. Landslide is caused by the nonlinear relationships of potential related factors and the effects of triggered factors such as meteorological or man-made damage. This study proposes a better performance of the prediction results by using machine learning model that is suitable for considering the nonlinear correlation of related factors.
Generally, landslides occur in very small numbers in widely study areas. In order to construct a predictive model using machine learning, the information about the landslide occurrence location and the non-landslide occurrence location must be used. However, all the study area data is used, the landslide prediction results are not reliable because they are mainly affected by the information about the non-landslides. Therefore, to minimize over-fitting or under-fitting due to data imbalance, the appropriate sampling rate of landslide and non-landslide data should be considered.
In this study, landslide prediction was performed using a machine learning models Random Forest (RF) and Multi-Layer Perceptron (MLP). RF builds multiple decision trees and merges them together to get a more accurate and stable prediction. RF model can be obtained variable importance which variables have the most predictive power. This value is used to identify the characteristics of related factors and to select the related factors to be used for landslide predicts. MLP is feedforward neural network with one or more layers between input and output layer. This model consists of at least three layers of nodes and each node is a neuron that uses a nonlinear activation function. So, it can distinguish data that is not linearly separable. Use this model to analyze nonlinear correlation landslide data, taking into account the importance of the factors and the sampling rate, and to verify the results.
This study aims to compare the results (susceptibility index) according to the change of sampling data rate using Random Forest and Multi-Layer Perceptron and to verify the model performance.

Acknowledgement: This research was supported by the MSIT (Ministry of Science, ICT), Korea, under the High-Potential Individuals Global Training Program (2019-0-01561) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation).

How to cite: Lee, J., Park, H., Lee, D., and Lim, S.: Landslide Susceptibility Assessment Considering Imbalanced Data: Comparison of Random Forest and Multi-Layer Perceptron, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12265, https://doi.org/10.5194/egusphere-egu2020-12265, 2020