EGU25-8954, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-8954
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Wednesday, 30 Apr, 16:15–18:00 (CEST), Display time Wednesday, 30 Apr, 14:00–18:00
 
Hall X3, X3.25
Wikimpacts 1.0: A new global climate impact database based on automated information extraction from Wikipedia
Ni Li1,2, Wim Thiery1, Shorouq Zahra3,6, Mariana Madruga de Brito4, Koffi Worou5,6, Murathan Kurfali3,6, Seppe Lampe1, Paul Munoz1, Clare Flynn5,6, Camila Trigoso1, Joakim Nivre3,6,7, Jakob Zscheischler2,8,9, and Gabriele Messori5,6,10
Ni Li et al.
  • 1Department of Water and Climate, Vrije Universiteit Brussel, Brussels, Belgium
  • 2Department of Hydro Sciences, TUD Dresden University of Technology, Dresden, Germany
  • 3RISE Research Institutes of Sweden, Sweden
  • 4Department of Urban and Environmental Sociology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
  • 5Department of Earth Sciences, Uppsala University, Uppsala, Sweden
  • 6Swedish Centre for Impacts of Climate Extremes (climes), Uppsala University, Uppsala, Sweden
  • 7Department of Linguistics and Philology, Uppsala University, Uppsala, Sweden
  • 8Department of Compound Environmental Risks, Helmholtz Centre for Environmental Research — UFZ, Leipzig, Germany
  • 9Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany
  • 10Department of Meteorology and Bolin Centre for Climate Research, Stockholm University, Stockholm, Sweden

Extreme climate events like storms, heatwaves, wildfires, floods, and droughts pose serious threats to human society and ecosystems. Measuring their impacts remains a crucial challenge scientifically. Although data linking climate hazards to socio-economic effects are crucial, their public availability is still relatively sparse. Existing open databases such as the Emergency Events Database (EM-DAT) and DesInventar Sendai offer some impact data on climate extremes,  but impact data on climate extremes also appear in newspapers, reports, and online sources like Wikipedia.

We introduce Wikimpacts 1.0, a comprehensive global database on climate impacts developed using natural language processing techniques. This database utilizes the GPT4o large language model for extracting information, following document selection,  post-processing, and data consolidation. In this release, we have processed 3,368 Wikipedia articles. Impact data for each event is recorded at three levels: event, national, and sub-national. Categories include the number of deaths, injuries, homelessness, displacements, affected individuals, damaged buildings, and insured or total economic damages. This dataset encompasses 2,928 events from 1034 to 2024, featuring 20,186 national and 36,394 sub-national data entries. Comparison with manually annotated data from 156 events shows that the Wikimpacts database is highly accurate in the event level for time, location, deaths, and economic damage, though details on injuries, affected individuals, homelessness, displacements, and building damage are slightly less precise. An analysis from 1900 to 2024 demonstrates that sub-national data provides more comprehensive coverage of tropical and extratropical storms, and wildfires than EM-DAT, with enhanced data on events in countries like the United States, Mexico, Canada, and Australia. Our study emphasizes the potential of natural language processing in creating open databases with reliable information on climate event impacts.

 

How to cite: Li, N., Thiery, W., Zahra, S., Madruga de Brito, M., Worou, K., Kurfali, M., Lampe, S., Munoz, P., Flynn, C., Trigoso, C., Nivre, J., Zscheischler, J., and Messori, G.: Wikimpacts 1.0: A new global climate impact database based on automated information extraction from Wikipedia, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-8954, https://doi.org/10.5194/egusphere-egu25-8954, 2025.