- 1Department of Applied Physics, Universitat de Barcelona, Barcelona, Spain (cguzzon@meteo.ub.edu)
- 2Institute of Complex Systems, Universitat de Barcelona, Barcelona, Spain (ubics@ub.edu)
Spain and the Mediterranean coast are largely affected by flash floods, which are generated by intense, localized storms within smaller basins (Gaume et al., 2016). In Spain, floods are the country's primary recurring natural disaster, accounting for nearly 70% of the compensation amount issued by the Consorcio de Compensación de Seguros (CCS, 2021). Improving early warning systems is crucial to reducing risks associated with floods. Comprehensive and up-to-date databases of past flood events serve as essential tools for developing such systems.
This study presents the implementation of an AI-based text-mining tool designed to automate the creation and updating of flood event databases using information extracted from newspapers. This tool is tailored to enhance and expand INUNGAMA, an impact database of flood events in the Catalonia region (Barnolas and Llasat, 2007), by extracting data from ‘La Vanguardia’, a major Catalan newspaper. The text-mining tool involves several steps, starting with the retrieval of potentially relevant news through keyword-based queries on the newspaper’s online archive. To eliminate irrelevant news, a natural language processing (NLP) model filters the initial dataset. Impact data of flood events are extracted by analyzing the newspaper text with an advanced NLP model; the extracted information is saved in a machine-readable and consistent format. Finally, the tool integrates the extracted data with the pre-existing INUNGAMA database, either by merging new information with existing events or by creating entries for previously undocumented events.
The tool was calibrated and tested using the INUNGAMA database. Its ability to download and filter relevant articles was assessed over six non-consecutive months, demonstrating excellent performance in identifying and distinguishing flood events. Furthermore, the AI model exhibited high accuracy in extracting impact data from the text when tested over one year of newspaper data.
The proposed AI-based tool offers a powerful solution for automating the creation and updating of flood impact databases, providing a solid foundation for developing early warning systems aimed at risk reduction. The text-mining tool is designed to complete the INUNGAMA database and to update it up to the present. Moreover, it can be adapted for creating new databases in other regions using different newspaper sources.
This research has been done in the framework of the Flood2Now project, Grant PLEC2022-009403 funded by MCIN/AEI/10.13039/501100011033 and by the European UnionNextGenerationEU/PRTR and the I-CHANGE Project from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement 101037193.
How to cite: Guzzon, C., Marcos Matamoros, R., Marinelli, D., Llasat-Botija, M., and Llasat-Botija, M. C.: An AI-Based Text-Mining Tool for flood impact data extraction from newspaper information, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-15719, https://doi.org/10.5194/egusphere-egu25-15719, 2025.