Text Mining of Loss Data
- The Netherlands Red Cross, 510, Netherlands (jmargutti@redcross.nl)
Structured datasets of loss data, i.e. data on the impact of past natural disasters, are of paramount importance for informing disaster preparedness programs and forecasting the impact of future disasters. Most of existing initiatives aim at manually building such datasets from information of goverments, humanitarian agencies and researchers. Unfortunately, the quality and completeness of such information is often insufficient, especially for small disasters and/or in areas where these organisations are not active. More often, it's local and national newspapers that report on small disasters. In this contribution, we present a series of algorithms to automatically extract structured loss data from online newspapers, even small ones that are not captured by common news aggregator (e.g. Google News). The algorithms are validated both in terms of accuracy of extraction and consistency with existing datasets; we argue that they provide a valuable tool to collect loss data in data-poor regions.
How to cite: Margutti, J. and van den Homberg, M.: Text Mining of Loss Data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20048, https://doi.org/10.5194/egusphere-egu2020-20048, 2020