Caravan - A global community dataset for large-sample hydrology

Frederik Kratzert; Grey Nearing; Nans Addor; Tyler Erickson; Martin Gauch; Oren Gilon; Lukas Gudmundsson; Avinatan Hassidim; Daniel Klotz; Sella Nevo; Guy Shalev; Yossi Matias

doi:https://doi.org/10.5194/egusphere-egu23-5256

[Back] [Session HS2.1.8]

EGU23-5256, updated on 22 Feb 2023

https://doi.org/10.5194/egusphere-egu23-5256

EGU General Assembly 2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Caravan - A global community dataset for large-sample hydrology

Frederik Kratzert

¹, Grey Nearing², Nans Addor

^3,4, Tyler Erickson⁵, Martin Gauch

^1,6, Oren Gilon⁷, Lukas Gudmundsson

⁸, Avinatan Hassidim⁷, Daniel Klotz

⁶, Sella Nevo⁷, Guy Shalev⁷, and Yossi Matias⁷

Frederik Kratzert et al.

¹Google Research, Vienna, Austria (kratzert@google.com)
²Google Research, Mountain View, CA, United States
³Fathom, Square Works, Bristol, UK
⁴Geography, University of Exeter, Exeter, UK
⁵Google, Mountain View, CA, USA
⁶Institute for Machine Learning, Johannes Kepler University, Linz, Austria
⁷Google Research, Tel Aviv, Israel
⁸Institute for Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland

High-quality datasets are essential to support hydrological science and modeling. Several datasets exist for specific countries or regions (e.g. the various CAMELS datasets). However, these datasets lack standardization, which makes global studies difficult. Additionally, creating large-sample datasets is a time and resource consuming task, often preventing the release of data that would otherwise be open. Caravan (as in “a series of camels”) is an initiative that tries to solve both of these problems by creating an open data processing environment in the cloud for the community to use.

Caravan is a globally consistent and open dataset

Caravan leverages globally available data sources that are published under an open license to derive meteorological forcings and attributes for any catchment. We use ERA5-Land for meteorological forcings and hydrological reference states (SWE and four levels of soil moisture) and HydroATLAS for the catchment attributes. Currently, Caravan consists of 6830 gauges with daily streamflow data (median record length ~30 years), 9 meteorological variables (from 1981 - 2020) in different daily aggregations, 4 hydrological reference states, and a total of 221 catchment attributes.

Caravan is derived entirely in the cloud

All meteorological time series (and hydrological reference states) from ERA5-Land are processed on Google Earth Engine, which removes the burden of downloading and processing large amounts of raw gridded data. Similarly, all catchment attributes are computed on Earth Engine. The code used to derive Caravan is publicly available (https://github.com/kratzert/Caravan/) . Once you have streamflow records and the corresponding catchment polygons, deriving all other data (forcing data and attributes) is a matter of a few hours of actual work. Depending on the number of catchments, their size and spatial distribution, that are being processed at once on Earth Engine , it might take a day or two for Earth Engine to extract meteorological data and catchment attributes.

Most importantly: Caravan is a community project

Even though the existing data in Caravan has good coverage over most climate zones, the spatial coverage is still patchy. Here is where we see Caravan as a community effort. Given the provided code, everybody with access to streamflow data and the authorisation to redistribute it can create a Caravan extension with minimal effort and share the extension with the community, thus contributing to a dynamically growing dataset. A full step-by-step tutorial is available at https://github.com/kratzert/Caravan/wiki. We envision that, with many people participating, this will result in a truly global and spatially consistent, large-sample hydrology dataset. A first Caravan extension was already published by Julian Koch (https://zenodo.org/record/7396466), which increased the number of gauges to 7138, by adding 308 gauges in Denmark.

How to cite: Kratzert, F., Nearing, G., Addor, N., Erickson, T., Gauch, M., Gilon, O., Gudmundsson, L., Hassidim, A., Klotz, D., Nevo, S., Shalev, G., and Matias, Y.: Caravan - A global community dataset for large-sample hydrology, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-5256, https://doi.org/10.5194/egusphere-egu23-5256, 2023.