- 1Lampata, (krasen@lampata.co.uk)
- 2ESA
- 3Serco
The EarthCODE Open Science Catalog (https://opensciencedata.esa.int/catalog) contains over 300 data products at this moment, most of them the result of peer-reviewed scientific research. Currently, these exist as disparate individual datasets, mostly grouped under themes or variables. This fragmentation creates a barrier to interoperability, where a scientist has to manually combine these datasets—for example reprojecting, regridding, or temporally resampling heterogeneous data.
EarthCODE is creating a new category of products-combined data cubes for each of the Open Science Catalog’s themes-to streamline access for science researchers and ensure the data is truly "Analysis-Ready" (ARD). Combining the data products into a single grid and a single projection will drastically reduce researcher overhead needed to harmonize the appropriate datasets. This workflow focuses on the combination of different datasets and collaborating with scientists to curate the appropriate data and to minimise disruption during the transformation process, since any reprojection or regridding introduces uncertainties.
We demonstrate the efficacy of this Pangeo-aligned workflow through the Antarctica InSync project (https://discourse-earthcode.eox.at/t/antartica-insync-data-cubes/107). This was a multi-stage pipeline that included close collaboration with the scientific community. The first step was aggregating the relevant Antarctic datasets. This step by itself is important, since it centralizes domain knowledge and ensures the Open Science Catalog contains the latest datasets relevant to the research community.
The second step involved processing the data using cloud-native tools to convert it to the same projection, common grid, and in some cases the same resolution (creating coherent STAC Collections). The third step involved the generation of detailed metadata at the variable level for all datasets to ensure high Findability and Reusability. Furthermore, we also provide the visualisation tools to explore the data cube via cloud-optimized formats, without downloading it, in addition to a discussion forum. To foster open science and reproducibility, our accompanying library will contain all generalizable functions that were used to generate this data, allowing the community to reuse these workflows for other domains.
How to cite: Samardzhiev, K., Samardzhiev, D., Anghelea, A., and Dobrowolska, E.: From Disparate Datasets to Analysis-Ready Data Cubes with Pangeo on EarthCODE, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21395, https://doi.org/10.5194/egusphere-egu26-21395, 2026.