EGU26-3402, updated on 13 Mar 2026
https://doi.org/10.5194/egusphere-egu26-3402
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 05 May, 10:45–12:30 (CEST), Display time Tuesday, 05 May, 08:30–12:30
 
Hall A, A.56
Constructing a global ground truth: A news-derived dataset for socioeconomic drought event validation
Yonatan Nakar, Grey Nearing, Rotem Mayo, Oleg Zlydenko, Frederik Kratzert, Moral Bootbool, Amitay Sicherman, Ido Zemach, and Deborah Cohen
Yonatan Nakar et al.
  • Google Research

Meteorological drought indices (e.g., SPI) and composite products (e.g., USDM) serve as standard benchmarks for evaluating drought forecasting models. However, these metrics are physical proxies rather than direct measures of societal impact. A precipitation deficit does not always manifest as a drought. Yet, when a true drought impacts agriculture, water supply, or ecosystems, it is typically reported in local or national media. To capture this reality, we introduce a comprehensive global dataset of socioeconomic drought events, designed to serve as an independent ground truth for model validation.

Our approach utilizes a scalable, two-stage pipeline. We first filter global web news data to identify candidate articles, followed by a targeted analysis of approximately 600,000 texts using Gemini. Unlike traditional keyword scraping, the LLM allows for nuanced semantic filtering. It explicitly distinguishes between natural drought events and water scarcity driven by infrastructure failure or mismanagement, ensuring the dataset reflects climatological hazards rather than human operational errors.

The resulting dataset provides verifiable event timelines for specific geographic regions. We extract precise location names from the text and map them to geospatial polygons, creating a structured record of where and when impacts occurred.

To utilize this dataset for validation, we propose a "3D Event Matching" strategy. We aggregate a given model’s pixel-wise forecasts into continuous spatiotemporal objects ("blobs") and compare them against the reported news polygons. This allows us to validate physical models against the entire lifecycle of a drought event, rather than requiring pixel-perfect alignment with isolated reports.

By providing a global, independent record of when and where droughts were actually felt by society, this work offers a necessary complement to physical and reanalysis data for next-generation drought forecast model development.

How to cite: Nakar, Y., Nearing, G., Mayo, R., Zlydenko, O., Kratzert, F., Bootbool, M., Sicherman, A., Zemach, I., and Cohen, D.: Constructing a global ground truth: A news-derived dataset for socioeconomic drought event validation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3402, https://doi.org/10.5194/egusphere-egu26-3402, 2026.