Building a Comprehensive Drought Impact Dataset by Integrating Disaster Databases and Reports with the use of Large Language Models.

Federico Ghiggini; Daria Ottonelli; Eva Trasforini; Edoardo Cremonese; Mirko D'Andrea; Tatiana Ghizzoni; Roberto Rudari

doi:https://doi.org/10.5194/egusphere-egu26-18357

[Back] [Session NH9.3]

EGU26-18357, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-18357

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Building a Comprehensive Drought Impact Dataset by Integrating Disaster Databases and Reports with the use of Large Language Models.

Federico Ghiggini^1,2, Daria Ottonelli², Eva Trasforini², Edoardo Cremonese², Mirko D'Andrea², Tatiana Ghizzoni², and Roberto Rudari²

Federico Ghiggini et al.

¹University of Genoa, Department of Computer Science, Bioengineering, Robotics, and Systems Engineering - DIBRIS, Department of Computer Science, Bioengineering, Robotics, and Systems Engineering - DIBRIS, Italy (federico.ghiggini@cimafoundation.org)
²CIMA Research Foundation

Droughts are among the most widespread and damaging natural hazards, yet information on individual events remains fragmented and difficult to compare across countries, despite being essential for drought risk assessment and for mitigation, adaptation strategies. For this reason, this work focuses on building a detailed, event-based drought database for the African continent by combining and expanding existing disaster data sources. Beyond serving as a comprehensive archive of past events, the database is intended to support the empirical derivation of drought impact functions and vulnerability curves.

The database relies on impact data from past drought events extracted from two main types of sources: global disaster loss databases and disaster reports. For the first category, three widely used platforms are adopted as a starting point: EM-DAT, IDMC, and DesInventar with records limited to the 2012–2024 period. The three databases differ substantially in the types of impacts recorded, which reflect different dimensions of impact indicator, as well as in data structure, spatial resolution, and temporal detail. For the first aspect, EM-DAT primarily reports affected populations, IDMC focuses on displaced populations, and DesInventar includes both affected people and damaged cropland expressed in hectares.

A comparative analysis of the three databases enabled the construction of an integrated dataset. Within the study period, EM-DAT reports 87 events across 31 countries, IDMC 31 events in 12 countries, and DesInventar 26 events in 10 countries. When considering only country and year of event onset or registration, 20 intersecting events were identified, with only two events common to all three databases. Although integration enriches the original datasets, substantial uncertainty remains in both the identification of individual drought events and the consistent quantification of impacts, mainly due to the limited overlap among sources.

To address these limitations, the study explores disaster reports through the use of artificial intelligence. A prompt-based approach using large language models is developed to extract structured information from unstructured text, including event timing, location, impacts, and affected sectors.

The AI-based extraction is implemented within a Python workflow to automate data processing and reduce manual curation. The approach has been tested in Somalia using 17 reports from United Nations agencies, government sources, and humanitarian organizations. Independent information on drought events in the Somaliland region was used for validation. Results show that the AI-assisted extraction successfully identifies drought events already present in the integrated database while providing more detailed impact descriptions, including clearer differentiation of affected populations consistent with IPCC classifications and explicit identification of impact drivers. The methodology is intended to be extended to the entire African continent.

How to cite: Ghiggini, F., Ottonelli, D., Trasforini, E., Cremonese, E., D'Andrea, M., Ghizzoni, T., and Rudari, R.: Building a Comprehensive Drought Impact Dataset by Integrating Disaster Databases and Reports with the use of Large Language Models., EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18357, https://doi.org/10.5194/egusphere-egu26-18357, 2026.