- Google Research
Meteorological drought indices (e.g., SPI) and composite products (e.g., USDM) serve as standard benchmarks for evaluating drought forecasting models. However, these metrics are physical proxies rather than direct measures of societal impact. A precipitation deficit does not always manifest as a drought. Yet, when a true drought impacts agriculture, water supply, or ecosystems, it is typically reported in local or national media. To capture this reality, we introduce a comprehensive global dataset of socioeconomic drought events, designed to serve as an independent ground truth for model validation.
Our approach utilizes a scalable, two-stage pipeline. We first filter global web news data to identify candidate articles, followed by a targeted analysis of approximately 600,000 texts using Gemini. Unlike traditional keyword scraping, the LLM allows for nuanced semantic filtering. It explicitly distinguishes between natural drought events and water scarcity driven by infrastructure failure or mismanagement, ensuring the dataset reflects climatological hazards rather than human operational errors.
The resulting dataset provides verifiable event timelines for specific geographic regions. We extract precise location names from the text and map them to geospatial polygons, creating a structured record of where and when impacts occurred.
To utilize this dataset for validation, we propose a "3D Event Matching" strategy. We aggregate a given model’s pixel-wise forecasts into continuous spatiotemporal objects ("blobs") and compare them against the reported news polygons. This allows us to validate physical models against the entire lifecycle of a drought event, rather than requiring pixel-perfect alignment with isolated reports.
By providing a global, independent record of when and where droughts were actually felt by society, this work offers a necessary complement to physical and reanalysis data for next-generation drought forecast model development.
How to cite: Nakar, Y., Nearing, G., Mayo, R., Zlydenko, O., Kratzert, F., Bootbool, M., Sicherman, A., Zemach, I., and Cohen, D.: Constructing a global ground truth: A news-derived dataset for socioeconomic drought event validation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3402, https://doi.org/10.5194/egusphere-egu26-3402, 2026.