- Google Research, Israel
A critical barrier to advancing large-sample hydrology and global risk assessment is the absence of a comprehensive, high-resolution historical event dataset. Existing resources are often geographically constrained, lack temporal or spatial precision, or are too sparse to support robust global synthesis. To address this gap, we introduce Groundsource, a novel, large-scale global dataset of historical flood events automatically constructed from diverse online news sources. By leveraging Google’s unique web page annotation capabilities and Gemini's natural language processing, we developed a pipeline to systematically identify and structure information about real-world flood events.
Our methodology first filters millions of news articles to isolate reports of actual, past floods, distinguishing them from warnings, policy discussions, and articles that mentions floods in other contexts. For each relevant article, we prompt Gemini to extract the specific dates and locations of the flooding. This structured data is then geocoded and aggregated to produce the Groundsource dataset. The dataset contains ~800,000 events with an estimated 75% precision.
While acknowledging the limited accuracy of LLM-based data extraction, and the inherent limitations of a news-based approach — such as recency-, population-, and coverage-bias — Groundsource represents a significant leap forward in data availability. As a publicly available, open resource covering over 100 countries, it provides a tool of unprecedented scale. Groundsource enables the research community to investigate global flood seasonality and temporal trends, to synthesize the socio-hydrological footprint of extreme events worldwide, to train data-driven models and to validate global flood forecasting systems.
How to cite: Zlydenko, O., Mayo, R., Bootbool, M., Kratzert, F., Sicherman, A., Zemach, I., and Cohen, D.: Groundsource - a Gemini constructed dataset of real world flood events from news, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3253, https://doi.org/10.5194/egusphere-egu26-3253, 2026.