EGU24-7652, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-7652
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

A semi-automatic natural language tool to minimize systematic biases in geo-hydrological disaster datasets in tropical Africa

Bram Valkenborg1, Olivier Dewitte2, and Benoît Smets1,2
Bram Valkenborg et al.
  • 1Cartography and GIS Research Group, Department of Geography, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
  • 2Natural Hazards and Cartography Service, Department of Earth Sciences, Royal Museum for Central Africa, Leuvensesteenweg 13, 3080 Tervuren, Belgium

The high susceptibility to geo-hydrological hazards in tropical Africa and their impacts remain poorly documented in existing disaster databases. Only impactful events with high attention are manually reported, creating systematic biases. Natural Language Processing has the potential to automate the documentation of geo-hydrological disasters. This research focuses on developing a semi-automated tool to extract information from online press and social media posts. Fine-tuned Large Language Models perform a series of tasks, such as question-answering, zero-shot classification, and near-entity recognition, to extract information from these online sources. A three-step approach is proposed for the detection of events: (1) filtering posts or articles on their relevancy, (2) extracting information on the location, timing, and impact and (3) merging and sorting information to document identified events into a structured disaster database. Shortcomings compared to a manual approach remain. These mainly relate to the complexity of the text or toponymic ambiguity when geocoding events. The tool is therefore complementary to other information-gathering approaches. These new sources of information will improve our understanding of the distribution of disasters related to geo-hydrological hazards, especially in data scarce context. Future work will combine this semi-automated tool with remote sensing and citizen science data, to further reduce systematic biases in disaster datasets.

How to cite: Valkenborg, B., Dewitte, O., and Smets, B.: A semi-automatic natural language tool to minimize systematic biases in geo-hydrological disaster datasets in tropical Africa, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7652, https://doi.org/10.5194/egusphere-egu24-7652, 2024.