EGU25-19958, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-19958
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
PICO | Monday, 28 Apr, 08:49–08:51 (CEST)
 
PICO spot A
AquaFetch: A Unified Python Interface for Water Resource Dataset Acquisition and Harmonization
Sara Iftikhar, Ather Abbas, and Hylke Beck
Sara Iftikhar et al.
  • King Abdullah University of Science and Technology, BESE, Saudi Arabia (sara.iftikhar@kaust.edu.sa)

In recent years, there has been a significant increase in the release of datasets across various domains, including water resources. This surge is driven by advancements in computational and storage technologies, as well as the growing need to develop robust, accurate data-driven solutions to address challenges such as climate change, water scarcity, and environmental pollution. As a result, a wealth of national and global spatio-temporal datasets has become freely accessible online. These datasets are invaluable for applications like flood forecasting, climate change analysis, aquatic ecosystem management, improving drinking water safety, and optimizing wastewater treatment processes.

Despite the availability of these datasets, importing them into Python remains cumbersome. Researchers must often sift through multiple sources, including search engines, GitHub repositories, and various websites, to locate the necessary data. The diversity of data providers means datasets are frequently presented in inconsistent units and stored in varying formats. Additionally, many datasets require extensive preprocessing before they can be used for analysis or modeling. This makes acquiring, cleaning, organizing, and managing data a complex task requiring advanced data handling skills.

These challenges highlight the need for a unified, consistent, automated, and reusable framework for extracting hydrological and environmental data. The water-datasets package addresses this gap by leveraging data-handling tools such as Pandas, NumPy, xarray, and Shapely to offer a streamlined workflow for automatic data extraction from multiple sources in various formats.

hydro-harmony is a Python package designed for the automated downloading, parsing, cleaning, and harmonization of freely available water resource datasets related to rainfall-runoff processes, surface water quality, and wastewater treatment. The package currently supports 66 datasets, downloading and transforming raw data into consistent, easy-to-use analysis-ready data. This allows users to directly access and utilize the data without labor-intensive and time-consuming preprocessing.

The package comprises three submodules, each representing a different type of water resource data: `rr` for rainfall-runoff processes, `wq` for surface water quality, and `wwt` for wastewater treatment. The rr submodule offers data for 47,716 catchments worldwide, encompassing both dynamic and static features for each catchment. The dynamic features consist of observed streamflow and meteorological time series, averaged over the catchment area, available at daily or hourly time steps. Static features include constant parameters such as land use, soil, topography, and other physiographical characteristics, along with catchment boundaries. This submodule not only provides access to established rainfall-runoff datasets such as CAMELS and LamaH but also introduces new datasets compiled for the first time from publicly accessible online data sources. The `wq` submodule offers access to 16 surface water quality datasets, each containing various water quality parameters measured across different spaces and times. The `wwt` submodule provides access to 22,201 experimental measurements related to wastewater treatment techniques such as adsorption, photocatalysis, and sonolysis.

The development of water-datasets was inspired by the growing availability of diverse water resource datasets in recent years. As a community-driven project, the codebase is structured to allow contributors to easily add new datasets, ensuring the package continues to expand and evolve to meet future needs.

How to cite: Iftikhar, S., Abbas, A., and Beck, H.: AquaFetch: A Unified Python Interface for Water Resource Dataset Acquisition and Harmonization, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19958, https://doi.org/10.5194/egusphere-egu25-19958, 2025.