EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Extracting flood locations from news media data by the named entity recognition (NER) model to assess urban flood susceptibility

Shengnan Fu, Heng Lyu, Ze Wang, and Xin Hao
Shengnan Fu et al.
  • Dalian University of Technology, Water Resources and flood control, Construction Engineering, China (

Flood susceptibility assessment for identifying flood-prone areas plays a significant role in flood hazard mitigation. Machine learning is an optional assessment method because of its high objectivity and computational efficiency, but how to get enough and accurate information of historical flood locations to train the machine learning models has been a key problem. In recent years, news media data from both news websites and social media authentication accounts has emerged as a promising source for natural science studies. However, the application of news media data in urban flood susceptibility assessment is still inadequate. This study proposed an approach of three tasks to use news media data on this topic. Firstly, flood locations were extracted from news media data based on a named entity recognition (NER) model. Then, a frequency or distance-based data quality control method was employed to improve the representativeness of the extracted flooded locations. Finally, flood conditioning factors with information of historical flood locations were input into a Support Vector Machine (SVM) model for flood susceptibility assessment. We took the central city of Dalian, China, as a case study. The results show that there was no significant difference of a T-test between the distributions of most flood conditioning factors at the flood locations from the news media data and the official planning report. In the obtained flood susceptibility map, the high flood susceptibility areas got a recall of 90% compared with the high flood hazard areas in the planning report. Performing data quality control in the frequency-based method can improve the precision of the flood susceptibility map by up to 5%, while the distance-based method is ineffective. This study provides an example and offers the value of applying new data sources and modern deep learning techniques for urban flood management. 

How to cite: Fu, S., Lyu, H., Wang, Z., and Hao, X.: Extracting flood locations from news media data by the named entity recognition (NER) model to assess urban flood susceptibility, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-3286,, 2023.

Supplementary materials

Supplementary material file