Extreme climate events like storms, heatwaves, wildfires, floods, and droughts pose serious threats to human society and ecosystems. Measuring their impacts remains a crucial challenge scientifically. Although data linking climate hazards to socio-economic effects are crucial, their public availability is still relatively sparse. Existing open databases such as the Emergency Events Database (EM-DAT) and DesInventar Sendai offer some impact data on climate extremes, but impact data on climate extremes also appear in newspapers, reports, and online sources like Wikipedia.
We introduce Wikimpacts 1.0, a comprehensive global database on climate impacts developed using natural language processing techniques. This database utilizes the GPT4o large language model for extracting information, following document selection, post-processing, and data consolidation. In this release, we have processed 3,368 Wikipedia articles. Impact data for each event is recorded at three levels: event, national, and sub-national. Categories include the number of deaths, injuries, homelessness, displacements, affected individuals, damaged buildings, and insured or total economic damages. This dataset encompasses 2,928 events from 1034 to 2024, featuring 20,186 national and 36,394 sub-national data entries. Comparison with manually annotated data from 156 events shows that the Wikimpacts database is highly accurate in the event level for time, location, deaths, and economic damage, though details on injuries, affected individuals, homelessness, displacements, and building damage are slightly less precise. An analysis from 1900 to 2024 demonstrates that sub-national data provides more comprehensive coverage of tropical and extratropical storms, and wildfires than EM-DAT, with enhanced data on events in countries like the United States, Mexico, Canada, and Australia. Our study emphasizes the potential of natural language processing in creating open databases with reliable information on climate event impacts.