- 1Cartography and GIS Research Group, Department of Geography, Vrije Universiteit Brussel, Brussels, Belgium
- 2Natural Hazards and Cartography Service, Department of Earth Sciences, Royal Museum for Central Africa, Tervuren, Belgium
Efforts worldwide aim to collect detailed information on the spatial and temporal distribution of natural hazards to improve our understanding of their occurrence and ultimately prevent their impacts. However, data on the location, timing, and impact of hazards remain scarce in many regions, even in the most exposed ones. Data collection methods are usually framed around earth observation approaches, sometimes combined with citizen science. Such approaches can be time-consuming, resource-intensive, and may fall short regarding data needs, especially at large scales. Combining these methods with complementary approaches could better address these challenges. We introduce a multilingual tool that uses natural language processing techniques to extract information on geo-hydrological hazards from online news articles. The tool is developed based on a worldwide application where we processed ~ 5.8 million articles published between 2017 and 2023 across 58 languages. The articles were extracted from GDELT (Global Database of Events, Language, and Tone), a global database monitoring events through online news articles. Using large language models, the tool analyzes articles at the paragraph level through three major steps: (1) filtering paragraphs for relevancy, (2) extracting information on the location (down to street level), timing, and impact, and (3) clustering information into events. This multilingual approach enabled the tool to extract and analyze 12.438 flood events, 1.312 landslide events, and 1.086 flash flood events globally for 2023 alone, providing ~ 20 times more data than current disaster databases and improving the coverage worldwide. In regions such as South and Central America, Europe, and Asia, where English is not the primary reporting language, non-English texts were the most important source of information. Especially in South and Central America, where non-English (primarily Spanish and Portuguese) paragraphs outnumbered English paragraphs by a factor of five. The proposed tool provides a new way to extract an unprecedented level of data on geo-hydrological hazards, forming a complementary source of information to existing methodologies. Beyond geo-hydrological hazards, the tool can be used to document other hazards, including earthquakes, wildfires, or volcanic activity. In addition, with this specific application, we provide a new extensive global dataset on impactful geo-hydrological hazards, which offers new opportunities for improving our understanding of these processes and their impact on continental to global scale.
How to cite: Valkenborg, B., Dewitte, O., and Smets, B.: A multilingual tool for the documentation of impactful geo-hydrological hazards using online news articles: a worldwide application, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-6548, https://doi.org/10.5194/egusphere-egu25-6548, 2025.