Living among Artiodactyls - Current status and future plans of the Caravan dataset
- 1Google Research, Vienna, Austria (kratzert@google.com)
- 2Fathom, Square Works, Bristol, UK
- 3Google Research, Tel Aviv, Israel
High-quality datasets are essential to support hydrological science and modeling. Several datasets exist for specific countries or regions (e.g. the various CAMELS datasets). However, these datasets lack standardization, which makes global studies difficult. Additionally, creating large-sample datasets is a time and resource consuming task, often preventing the release of data that would otherwise be open.
About a year ago, we released the Caravan (as in “a series of camels”) dataset, a community initiative that consists of
- a large-sample hydrology dataset which is derived from globally consistent data sources, and
- open source code that facilitates the creation of Caravan extensions to new regions by leveraging cloud computing on Earth Engine.
On release, the Caravan dataset included 6830 gauges from 14 different countries with daily streamflow records (median record length ~30 years), 9 meteorological variables (from 1981 - 2020) in different daily aggregations, 4 hydrological reference states, and a total of 221 catchment attributes.
Since then, the dataset has been extended with several thousands of gauges in various, previously uncovered regions by different community members. Importantly, GRDC has joined the Caravan community effort and released a Caravan extension for 5357 watersheds (covering the period from 1950-2022) from the GRDC station catalog from 25 different countries.
At this point, and with all extensions combined, the Caravan dataset now consists of 22494 gauge stations from 35 countries and contains a total of 660,382 years of streamflow records (median still at ~30 years).
With this submission, we want to reflect in more detail on the current state of the Caravan community efforts and share our thoughts and ideas for the future of Caravan. Additionally, we welcome interactions with owners of hydrological datasets interested in contributing to Caravan and discussions with users of large-sample datasets to understand the needs and desires for datasets and inform our future efforts. All information on Caravan can be found at https://github.com/kratzert/Caravan/
How to cite: Kratzert, F., Addor, N., Shalev, G., and Gilon, O.: Living among Artiodactyls - Current status and future plans of the Caravan dataset, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12051, https://doi.org/10.5194/egusphere-egu24-12051, 2024.