EGU26-7185, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-7185
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Friday, 08 May, 09:55–10:05 (CEST)
 
Room 2.44
Toward a National Database of Boil Water Advisories in the United States Using Web Scraping and Large Language Models
Eli Cook1, Landon Marston2, and Alasdair Cohen3
Eli Cook et al.
  • 1Virginia Tech, Civil Engineering, United States of America (elidcook51@gmail.com)
  • 2Virginia Tech, Department of Civil and Environmental Engineering
  • 3Virginia Tech, Department of Population Health Sciences

Boil water advisories (BWAs) are essential public health alerts issued when drinking water safety is compromised, yet the United States lacks a centralized database to track these events. Such a dataset would enable epidemiological studies, infrastructure resilience assessments, and policy analysis to better understand advisory causes, impacts, and regional disparities. This research introduces a scalable framework for building this database and a generalizable methodology for converting unstructured online information into machine-readable datasets. Our approach integrates automated web scraping with large language models (LLMs) to extract and standardize advisory attributes such as location, duration, and cause. Preliminary validation compares U.S. data against ground-truth datasets from Canada and Kentucky to assess coverage and accuracy, with early findings indicating substantial capture of advisories despite variability in reporting formats. Future work will refine search strategies to improve precision and extend this methodology to other domains lacking centralized data, such as water quality violations and emergency notifications. This study demonstrates the potential of combining web scraping and LLM-based text processing to address critical data gaps in environmental and public health monitoring.

How to cite: Cook, E., Marston, L., and Cohen, A.: Toward a National Database of Boil Water Advisories in the United States Using Web Scraping and Large Language Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7185, https://doi.org/10.5194/egusphere-egu26-7185, 2026.