Automatized drought impact detection from newspaper articles using natural language processing and machine learning
- 1Helmholtz-Centre for Environmental Research, Department of Urban and Environmental Sociology, Leipzig, Germany
- 2Institute for Environmental Sciences and Geography, University of Potsdam, Potsdam, Germany
Droughts are expected to increase both in terms of frequency and magnitude across Europe. While they impose diverse impacts on social-ecological systems, most impact assessments focus on particular sectors or economic aspects. Existing multi-sectoral datasets are limited in spatio-temporal homogeneity and scope due to the manual extraction of impacts from text-based sources. To address this, we developed a novel method for the automatized detection of drought impacts based on newspaper articles. By employing natural language processing and machine learning models, our method is able to extract different classes of drought impacts (e.g. agriculture, forestry, livestock) and their geographic and temporal scope from text data. We applied this method to generate a multi-sectoral dataset of drought impacts in Germany between 2000 and 2021. About 41121 articles from different journals were considered. Accuracy levels of 92-96% per impact class were obtained for the automatic classification of the impacts when evaluated on a human-annotated dataset. For validation against independent data, first results show that our method can replicate both temporal and spatial trends. Our approach advances existing techniques because it (1) requires a significantly lower workload, (2) allows addressing large amounts of data, (2) reduces subjectivity and human bias, and (4) is generalizable to other hazard types as well as text corpora while achieving sufficient levels of accuracy. The findings highlight the applicability of natural language processing and machine learning to create comprehensive impact datasets. Furthermore, the generated information can be used for validating drought risk assessments and impact models.
How to cite: Sodoge, J., de Brito, M. M., and Kuhlicke, C.: Automatized drought impact detection from newspaper articles using natural language processing and machine learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-511, https://doi.org/10.5194/egusphere-egu22-511, 2022.