EGU25-11162, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-11162
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Friday, 02 May, 14:00–15:45 (CEST), Display time Friday, 02 May, 14:00–18:00
 
Hall A, A.64
A multi-stage machine learning application for predicting vegetation distribution and its factors in river channels
Hitoshi Miyamoto and Naoya Maeda
Hitoshi Miyamoto and Naoya Maeda
  • Shibaura Institute of Technology, Civil Engineering, Tokyo, Japan (miyamo@shibaura-it.ac.jp)

This study developed an ML (machine learning) model that predicts the vegetation distribution of the following year from the current year's conditions by applying the ML model in multiple stages. The target rivers examined in this study were five Japanese large rivers, i.e., Kinugawa, Edogawa, Yahagigawa, Shonaigawa, and Ibogawa. The multi-stage ML model's explanatory and target variables were created for each river segment using DEMs (Digital Elevation Models) and river environment base maps. The multi-stage ML model consisted of three ML stages to predict the vegetation distribution of the following year from the current river vegetation distribution and topographical information. The advantage of the multi-stage ML model was that a third-stage vegetation distribution prediction model could be constructed according to the difficulty of prediction using a second-stage classification result. XGB (eXtreme Gradient Boosting) was used as the machine learning model. SHAP (SHapley Additive exPlanations) was used for factor analysis. F1 score with five-fold cross-validation was used to evaluate the model's accuracy. The result of the multi-stage ML model for the five target rivers showed that the F1 score was 0.8 or higher for all rivers except the Kinugawa River. The multi-stage ML model had an accuracy of 10% higher F1 score than a conventional single ML model. The vegetation distribution probability map indicated that the prediction had a high general accuracy but dropped near the boundary between the river's low water channel and the floodplain. SHAP analysis revealed the three prominent factors for vegetation existence: (i) the relative height near the levee and in the center of the floodplain, (ii) the distance from the river water's edge near the low water channel, and (iii) the vegetation existence history at the boundary between the low water channel and the floodplain. These results suggest that combining the prediction map with factor analysis could identify the factors that significantly influence where vegetation recruits in a river course.

How to cite: Miyamoto, H. and Maeda, N.: A multi-stage machine learning application for predicting vegetation distribution and its factors in river channels, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-11162, https://doi.org/10.5194/egusphere-egu25-11162, 2025.