- 1Utrecht University, Faculty of Geosciences, Physical Geography, Utrecht, Netherlands (e.r.jones@uu.nl)
- 2Google Research, Vienna, Austria
The last decade has seen a proliferation in efforts to compile, standardise and openly disseminate datasets spanning hundreds to thousands of catchments, driven by the emergence of large-sample hydrology as a sub-discipline in hydrological sciences. While these datasets have facilitated novel research into the field of water quantity (e.g. streamflow prediction), comparable advances for water quality research remain limited1.
Here, we present the first global integration of water quality into large-sample hydrology (named Caravan-Qual). The dataset contains >70 million river water quality observations covering 100 water quality constituents, compiled from a range of national-to-global datasets covering the period of 1980-2025. Water quality data has been standardised to common naming conventions and reporting units, and further processed to remove duplicates, detect outliers and handle observations below detection limits. Leveraging the Caravan2 dataset and open-source software, water quality monitoring stations are matched to streamflow gauges – with ~31% of daily water quality observations paired to a daily streamflow measurement within a 10km distance. Furthermore, meteorological variables (e.g. temperature, precipitation, net radiation) and catchment attributes (e.g. land cover, soil characteristics) are derived for water quality monitoring stations.
Caravan-Qual is openly available at: https://doi.org/10.5281/zenodo.177870663, and is envisaged to facilitate research into topics including:
- Spatio-temporal analysis of river water quality dynamics at local to global scales.
- Investigation of the relationships between (constituent-specific) river water quality responses and hydrological, meteorological and catchment characteristics.
- The development and evaluation of process-based, hybrid and data-driven water quality models across diverse hydrological and climatic conditions.
References
1Jones, E. R., Graham, D. J., van Griensven, A., Flörke, M. & van Vliet, M. T. H. Blind spots in global water quality monitoring. Environmental Research Letters 19, 091001 (2024). https://doi.org:10.1088/1748-9326/ad6919
2Kratzert, F. et al. Caravan - A global community dataset for large-sample hydrology. Scientific Data 10, 61 (2023). https://doi.org:10.1038/s41597-023-01975-w
3Jones, E. R., Kratzert, F. & van Vliet, M. T. H. Caravan-Qual: A global scale integration of water quality observations into a large sample hydrology dataset. Zenodo [DATASET] (2025). https://doi.org:10.5281/zenodo.17787066
How to cite: Jones, E. R., Kratzert, F., and van Vliet, M. T. H.: Caravan-Qual: A global scale integration of water quality observations into a large-sample hydrology dataset, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-1426, https://doi.org/10.5194/egusphere-egu26-1426, 2026.