EGU General Assembly 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Automatized drought impact detection from newspaper articles using natural language processing and machine learning

Jan Sodoge1, Mariana Madruga de Brito1, and Christian Kuhlicke1,2
Jan Sodoge et al.
  • 1Helmholtz-Centre for Environmental Research, Department of Urban and Environmental Sociology, Leipzig, Germany
  • 2Institute for Environmental Sciences and Geography, University of Potsdam, Potsdam, Germany

Droughts are expected to increase both in terms of frequency and magnitude across Europe. While they impose diverse impacts on social-ecological systems, most impact assessments focus on particular sectors or economic aspects. Existing multi-sectoral datasets are limited in spatio-temporal homogeneity and scope due to the manual extraction of impacts from text-based sources. To address this, we developed a novel method for the automatized detection of drought impacts based on newspaper articles. By employing natural language processing and machine learning models, our method is able to extract different classes of drought impacts (e.g. agriculture, forestry, livestock) and their geographic and temporal scope from text data. We applied this method to generate a multi-sectoral dataset of drought impacts in Germany between 2000 and 2021. About 41121 articles from different journals were considered. Accuracy levels of 92-96% per impact class were obtained for the automatic classification of the impacts when evaluated on a human-annotated dataset. For validation against independent data, first results show that our method can replicate both temporal and spatial trends. Our approach advances existing techniques because it (1) requires a significantly lower workload, (2) allows addressing large amounts of data, (2) reduces subjectivity and human bias, and (4) is generalizable to other hazard types as well as text corpora while achieving sufficient levels of accuracy. The findings highlight the applicability of natural language processing and machine learning to create comprehensive impact datasets. Furthermore, the generated information can be used for validating drought risk assessments and impact models.

How to cite: Sodoge, J., de Brito, M. M., and Kuhlicke, C.: Automatized drought impact detection from newspaper articles using natural language processing and machine learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-511,, 2022.


Display file

Comments on the display

to access the discussion