EGU24-8923, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-8923
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Enhanced drought impact monitoring: integrating automated search, translations, and text analysis into online media report scraping

Monika Bláhová1, Veit Blauhut2, Mirko D’Andrea3, Lauro Rossi3, Kerstin Stahl4, and Kathrin Szillat4
Monika Bláhová et al.
  • 1Global Change Research Institute CAS, Brno, Czechia
  • 2Saxon State Ministry for Energy, Climate Protection, Environment and Agriculture, Dresden, Germany
  • 3CIMA research foundation, Savona, Italy
  • 4University of Freiburg, Freiburg, Germany

Droughts are among the most destructive natural disasters affecting millions worldwide, profoundly impacting society and ecosystems. The demand for effective drought impact monitoring and reporting systems was proven to be crucial for timely mitigation and response. Traditionally, drought impact monitoring systems rely heavily on manual processing analysis and validation of physical and online reports or costly clipping databases, often lacking real-time information. The manual processing of drought impact reports is not only time-consuming but also prone to inconsistencies and delays in the long term. The sheer volume of data generated daily demands significant human resources, often leading to escalated costs and low viability of the final drought impact databases. These challenges underscore the need for more efficient, cost-effective, and reliable methods to process and analyze drought-related data. Recent advancements in large language models (LLM) and artificial intelligence (AI) tools have opened new pathways for enhancing drought impact monitoring systems. The recent EDORA (The European Drought Observatory for Resilience and Adaptation) project enabled us to employ these novel methods, facilitating the task of populating the European Drought Impact Database (EDID). The specific methodology and workflow we tested involved three steps: (1) automated searching for drought impact-related online media posts, (2) automated text translations, and (3) automated text content analysis. Step (1) of the workflow involved employing Google News Archive Search for EU countries in 2000-2022 to scrape relevant online media reports automatically. Searching was based upon predefined search queries translated into all official EU languages. The media report's content was acquired using the trafilatura Python package. A large number of reports found this way were then, in Step (2), translated to the English language using Amazon AWS Translation service. In order to support a correct selection and classification of the drought impact database’s structure, Step (3) was necessary. The translated reports were further analyzed and classified using the GPT 3.5 API, extracting structured data from unstructured text. Thanks to this semi-automated workflow, we analyzed over 60000 online reports and included over 700 additional entries to the EDID. The difference in these numbers shows that the multi-step workflow is necessary to select only those reports that comply with the drought impact definitions of EDID. The contribution will illustrate the difficulties and successes in each step with specific examples. In conclusion, integrating LLM and AI tools into drought impact monitoring systems presents a significant leap forward in our ability to process vast amounts of data quickly and accurately. While some expert decisions are still necessary in our workflow, this innovation reduces the reliance on manual labor and associated costs.  From an operational risk management perspective, it enhances the responsiveness and effectiveness of drought impact reporting. As we continue to refine and expand these technologies, we anticipate a future where real-time, accurate drought impact monitoring is not just a possibility but a reality.

How to cite: Bláhová, M., Blauhut, V., D’Andrea, M., Rossi, L., Stahl, K., and Szillat, K.: Enhanced drought impact monitoring: integrating automated search, translations, and text analysis into online media report scraping, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8923, https://doi.org/10.5194/egusphere-egu24-8923, 2024.