EGU General Assembly 2021
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

GRQA: Global River Water Quality Archive

Holger Virro1, Giuseppe Amatulli2,3, Alexander Kmoch1, Longzhu Shen4,5, and Evelyn Uuemaa1
Holger Virro et al.
  • 1University of Tartu, Institute of Ecology and Earth Sciences, Department of Geography, Tartu, Estonia (
  • 2Yale University, School of the Environment, New Haven, CT, 06511, USA
  • 3Yale University, Center for Research Computing, New Haven, CT, 06511, USA
  • 4University of Cambridge, Department of Zoology, Cambridge, CB2 3EJ, UK
  • 5Spatial-Ecology, Meaderville House, Wheal Buller, Redruth, TR16 6ST, UK

Recent advances in implementing machine learning (ML) methods in hydrology have given rise to a new, data-driven approach to hydrological modeling. Comparison of physically based and ML approaches has shown that ML methods can achieve a similar accuracy to the physically based ones and outperform them when describing nonlinear relationships. Global ML models have been already successfully applied for modeling hydrological phenomena such as discharge.

However, a major problem related to large-scale  water quality modeling has been the lack of available observation data with a good spatiotemporal coverage. This has affected the reproducibility of previous studies and the potential improvement of existing models. In addition to the observation data itself, insufficient or poor quality metadata has also discouraged researchers to integrate the already available datasets. Therefore, improving both, the availability, and quality of open water quality data would increase the potential to implement predictive modeling on a global scale.

We aim to address the aforementioned issues by presenting the new Global River Water Quality Archive (GRQA) by integrating data from five existing global and regional sources:

  • Canadian Environmental Sustainability Indicators program (CESI)
  • Global Freshwater Quality Database (GEMStat)
  • GLObal RIver Chemistry database (GLORICH)
  • European Environment Agency (Waterbase)
  • USGS Water Quality Portal (WQP)

The resulting dataset contains a total of over 14 million observations for 41 different forms of some of the most important water quality parameters, focusing on nutrients, carbon, oxygen and sediments. Supplementary metadata and statistics are provided with the observation time series to improve the usability of the dataset. We report on developing a harmonized schema and reproducible workflow that can be adapted to integrate and harmonize further data sources. We conclude our study with a call for action to extend this dataset and hope that the provided reproducible method of data integration and metadata provenance shall lead as an example.

How to cite: Virro, H., Amatulli, G., Kmoch, A., Shen, L., and Uuemaa, E.: GRQA: Global River Water Quality Archive, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3865,, 2021.

Corresponding displays formerly uploaded have been withdrawn.