EGU2020-8365, updated on 12 Jun 2020
EGU General Assembly 2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Random forest classification of morphology in the northern Gerecse (Hungary) to predict landslide-prone slopes

Gáspár Albert1 and Dávid Gerzsenyi2
Gáspár Albert and Dávid Gerzsenyi
  • 1Eötvös Loránd University, Cartography and Geoinformatics, Budapest, Hungary (
  • 2Eötvös Loránd University, Doctorate School of Earth Sciences, Budapest, Hungary (

The morphology of the Gerecse Hills bears the imprints of fluvial terraces of the Danube River, Neogene tectonism and Quaternary erosion. The solid bedrocks are composed of Mesozoic and Paleogene limestones, marls, and sandstones, and are covered by 115 m thick layers of unconsolidated Quaternary fluvial, lacustrine, and aeolian sediments. Hillslopes, stream valleys, and loessy riverside bluffs are prone to landslides, which caused serious damages in inhabited and agricultural areas in the past. Attempts to map these landslides were made and the observations were documented in the National Landslide Cadastre (NLC) inventory since the 1970’s. These documentations are sporadic, concentrating only on certain locations, and they often refer inaccurately to the state and extent of the landslides. The aim of the present study was to complete and correct the landslide inventory by using quantitative modelling. On the 480 sq. km large study area all records of the inventory were revisited and corrected. Using objective criteria, the renewed records and additional sample locations were sorted into one of the following morphological categories: scarps, debris, transitional area, stable accumulation areas, stable hilltops, and stable slopes. The categorized map of these observations served as training data for the random forest classification (RFC).

Random forest is a powerful tool for multivariate classification that uses several decision trees. In our case, the predictions were done for each pixels of medium-resolution (~10 m) rasters. The predictor variables of the decision trees were morphometric and geological indices. The terrain indices were derived from the MERIT DEM with SAGA routines and the categorized geological data is from a medium-scale geological map [1]. The predictor variables were packed in a multi-band raster and the RFC method was executed using R 3.5 with RStudio.

After testing several combinations of the predictor variables and two different categorisation of the geological data, the best prediction has cca. 80% accuracy. The validation of the model is based on the calculation of the rate of well-predicted pixels compared to the total cell-count of the training data. The results showed that the probable location of landslide-prone slopes is not restricted to the areas recorded in the National Landslide Cadastre inventory. Based on the model, only ~6% of the estimated location of the highly unstable slopes (scarps) fall within the NLC polygons in the study area.

The project was partly supported by the Thematic Excellence Program, Industry and Digitization Subprogram, NRDI Office, project no. ED_18-1-2019-0030 (from the part of G. Albert) and the ÚNKP-19-3 New National Excellence Program of the Ministry for Innovation and Technology (from the part of D. Gerzsenyi).


[1] Gyalog L., and Síkhegyi F., eds. Geological map of Hungary (scale: 1:100 000). Budapest, Hungary, Geological Institute of Hungary, 2005.

How to cite: Albert, G. and Gerzsenyi, D.: Random forest classification of morphology in the northern Gerecse (Hungary) to predict landslide-prone slopes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8365,, 2020

Display materials

Display file

Comments on the display material

AC: Author Comment | CC: Community Comment | Report abuse

Display material version 1 – uploaded on 03 May 2020
  • CC1: Comment on EGU2020-8365, Sijin Li, 06 May 2020

    Hi Albert! I am sorry that I missed your live-show… but I read all your presentation and record in the chatroom. However,I still have three questions.

    Firstly, you mentioned in slide #6 that you use some objective criteria to separate the renewed NLC into three morphological categories. I wonder if you could explain how to do this transformation? Based on some terrain derivatives or other criteria?

    Secondly, in “conclusions”, you mentioned, “only ~6% of the highly unstable slopes (scarps) fall within the NLC polygons”. I’m not sure if the “highly unstable slopes” has the same meaning with “areas with high surface slope values”. I think some geological features are also important for landslide research. For example, landslides might easily happen in the area of the anticline. Do you add such knowledge to your dataset?

    Finally, could you explain why you use wetness in the training step?

    Thanks for your attention! :)

    • AC1: Reply to CC1, Gáspár Albert, 06 May 2020

      Hi Sijin, thanks for the questions! The criteria to select the NLC poligons into 3 subcategories were based on field studies. For example in the case of loess we could distinguish the transitional area from the scarps based on the internal structure. Obviously the position of the polygon on the slope plays important role in determining this (e.g. we didn’t find scarps in topographically lower position than debris) – that’s why we included the normalized height parameter.

      The “highly unstable slopes” are the scarp regions (having usually steep slope values). Geological features were included as lithological parameters. You suggesting using structural parameters, which is a good idea in some areas, but not here. The Gerecse Hills is not a folded mountain! Everything is relatively horizontal, and the most landslide-prone formations are the unconsolidated Quaternary sediments.

      And that explains why we used the TWI; which corresponds to the Quaternary formations: e.g. proluvial sediments, and coluvial ones are easily distinguished.

      So to sum it up: field work is essential, and the variables which are appropriate for the modelling, may vary by the working area.

      • CC2: Reply to AC1, Sijin Li, 06 May 2020

        Hi Albert! Thanks for your reply! Your explanation is really great, and I've learned a lot from your answers. I think I also need to consider the "position" that you mentioned in the first answer. 

        I also have a small hint (I'm not sure if it is useful for you). I read some papers about the positive and negative landform (or called convex and concave landform). Generally, the structure/surface on the positive (convex) landform areas is more stable than that on the negative (concave) landform. I think it might have the same meaning with "position" you mentioned. Some methods have been proposed to extract these two kinds of landform areas. I also think such two landform areas might can be used to improve the reasonability of our results.

        • AC2: Reply to CC2, Gáspár Albert, 06 May 2020

          You are right! The landforms’ positive/negative properties are essential in determining the most possible “position” of the landslide! And we just did it, using the curvature of the slope! If the curvature is positive the landform is a ridge, a shoulder of the slope or a summit area, while if it is negative, it can be a valley bottom, a sinkhole or a rising slope.