EGU23-11586, updated on 26 Feb 2023
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Bayesian logistic regression and optimized XGBoost models for landslide susceptibility assessment

Benjamin Mirus1,2 and Jacob Woodard1
Benjamin Mirus and Jacob Woodard
  • 1U.S. Geological Survey (USGS)
  • 2Swiss Federal Institute for Forest, Snow and Landscape Research (WSL)

Bayesian logistic regression with vague priors and optimized XGBoost models are two contrasting and commonly used approaches for modeling landslide susceptibility. Logistic regression calculates the log odds of a binary outcome (i.e., landslide or no landslide) given some predictor data (e.g., slope, elevation, and geology) that describes the terrain of each mapping unit used to divide the terrain for susceptibility evaluation. The Bayesian implementation incorporates uncertainty into the model by using probability distributions of the model parameters. Weakly informative priors ensure that the likelihood function (i.e., observational data) dominates posterior distributions, which can be estimated using the statistical software Stan. Like logistic regression, the gradient boosting decision tree machine learning algorithm XGBoost requires the predictor data of each mapping unit to output a probability of an event. Decision trees are a non-parametric learning tool that uses a set of if-then-else decision rules to predict the expected model outcome. Gradient boosting is a method of sequentially adding more decision trees to improve the model output until the lowest model residual levels are reached while penalizing for the level of complexity added to the model. We optimize the model parameters using a Bayesian cross-validation procedure on a portion of the training data. To obtain distributions of the level of susceptibility from XGBoost, a 10-fold cross-validation procedure with ten iterations is implemented. Evaluation of both Bayesian logistic regression and XGBoost algorithms is performed using the area under the curve of the receiver operator characteristics and the Brier score, but any other common metric for evaluation is possible. Model development and evaluation is carried out through the computational environment R. These methods have been applied with success to many diverse regions of the United States and would benefit from testing with the benchmark datasets proposed by the conveners.

How to cite: Mirus, B. and Woodard, J.: Bayesian logistic regression and optimized XGBoost models for landslide susceptibility assessment, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-11586,, 2023.