EGU24-9063, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-9063
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Can Catchment attributes coupled with an Ensemble of Machine Learning improve Flood Hazard mapping over large data-scarce catchments?

Mohit Mohanty and Vaibhav Tripathi
Mohit Mohanty and Vaibhav Tripathi
  • IIT Roorkee, Department of Water Resources Development and Management, Roorkee, India (mohit.mohanty@wr.iitr.ac.in)

The world has experienced the profound and devastating consequences of floods on human life, prompting a shift from mere academic examination to a critical socio-political imperative. To initiate effective flood risk management, many nations are working on creating user-friendly tools to identify flood-prone areas across extensive watersheds. Recently, Geomorphic Flood Descriptors (GFDs), which rely on the characteristics of the river drainage and are computationally less demanding, have been used as an efficient alternative to complex hydraulic models. However, validating the flood inundation maps from GFDs remains a major challenge, especially for ungauged watersheds that limit the adoption of data-intensive hydraulic modeling. In addition, as weather patterns and climate variations incur significant heterogeneity in flood patterns over large watersheds, we need to find error-free benchmark maps to validate the GFDs. The present study explores the suitability of Ensemble Machine Learning (ML) models to represent flooding at high resolution over large ungauged watersheds, thus paving the major research gap of authenticating the GFD-derived flood map with ground truth in ungauged basins. A suite of about 25 flood-influencing factors incorporating geomorphological, climatological, and soil parameters such as the Geomorphic Flood Index (GFI), Topographic Wetness Index (TWI), Height Above the Nearest Drainage (HAND), Slope, Stream Power Index (SPI), rainfall, soil type, and horizontal distance from the stream, etc., were derived from a high-resolution DEM (CartoDEM, resolution~30m). The two most prominent tree-based machine learning (ML) techniques, Random Forest (RF), and Extreme Gradient Boosting (XGBoost) were employed to simulate flood inundation at a fine scale of 30m in the severely flood-prone Mahanadi basin. An ensemble of linear model, random forest, and support vector machine models were further tested for geographical extrapolation which quantified the flood hazard in an ungauged basin, which was lagged by tree-based models. These ML models were trained using a flood inundation map derived from LISFLOOD-FP using the ERA5 reanalysis dataset. The performance of the GFD-derived flood map is tested against the LISFLOOD-FP flood map through a set of performance statistics. The performance of the model developed was evaluated using Area Under the receiver operating characteristics curve (AUC), kappa coefficient, precision, recall, and F1 score, while RMSE and KGE were used for regression models. The ambiguous nature of ML models was also estimated using SHAP values to justify the degree of influence of each GFD on flood depth. The ongoing research also inspires to the development of a global flood inundation atlas using RCMs, which can be used to compare and validate inundation over large regions through geomorphic analysis. Any uncertainty in flood inundation estimates may amplify largely while quantifying flood risk, including vulnerability and exposure dimensions.

Keywords: Flood hazard, Geomorphic Flood Descriptors, LISFLOOD-FP, Machine Learning, SHAP

How to cite: Mohanty, M. and Tripathi, V.: Can Catchment attributes coupled with an Ensemble of Machine Learning improve Flood Hazard mapping over large data-scarce catchments?, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9063, https://doi.org/10.5194/egusphere-egu24-9063, 2024.