- 1Eurac Research, Climate Change and Transformation, Bolzano, Italy (stefano.terzi@eurac.edu)
- 2Eurac Research, Institute for Applied Linguistic, Bolzano, Italy
- 3Department of Agriculture, Food, Environment and Forestry, University of Florence, Firenze, Italy
- 4CIMA Research Foundation, Savona, Italy
Research on climate extremes, particularly droughts, is largely limited by the lack of impact data. Current impact data are often sparse if not completely inaccessible or absent. This is the ongoing condition also for mountain areas, which, despite hosting important and interconnected environmental and socio-economic systems, are increasingly impacted by droughts with limited to no-data coverage.
This work explores the use of textual data from online Italian newspaper articles, blogs, and reports to collect information on drought impacts on different socio-economic sectors and regions across the Italian Alps. In particular, we developed a pipeline to create an open database of drought news reporting. We used natural language processing (NLP) methods to automatically (i) extract news articles from Google News using drought-related keywords in Italian language, (ii) filter and clean the retrieved articles extracting text bodies, and (iii) classify them, identifying the impacted sectors (e.g., agriculture, hydropower, tourism) and regions. We evaluated the performance of different state-of-the-art NLP models on the chosen classification tasks (e.g., relevance to the drought topic, extraction of the impacted location) based on both standard NLP metrics and (environmental) resource consumption criteria.
Preliminary results show patterns of correspondence between the frequency of harvested drought impact news and the general trend of drought conditions in the north of Italy (e.g. maximum values of news items in summer 2022 and spring 2023). Around 60% of the collected news items were classified as relevant to the drought topic, 35% were recorded as explicitly covering drought impacts, while 15% were reported to deal with drought damages in detail. Regarding the detection of impacted sectors and locations inside news bodies, due to task complexity, selected models reported varied performance with results highly dependent on the specific news structure and context.
Overall, this study (i) presents a workflow to collect drought impact data for the Italian Alps into an open database, enabling near-real time drought impact monitoring, (ii) enriches the developed database with information on news relevance to the drought topic, documented impacts, and mentioned locations, including reliability estimates for given classifications, (iii) offers methodological guidance for future research by providing information on best performing algorithms and environmental cost criteria, (iv) has the potential for transferability to other areas, languages, or natural hazards to improve the understanding of climate extremes impacts and implement targeted and effective adaptation strategies.
How to cite: Terzi, S., Pomella, A., Frey, J.-C., Piemontese, L., Cremonese, E., and Pittore, M.: Advancing drought impact data collection for the Italian Alps through automatic harvesting and analysis of textual data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10520, https://doi.org/10.5194/egusphere-egu25-10520, 2025.