EGU25-17171, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-17171
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Thursday, 01 May, 10:45–12:30 (CEST), Display time Thursday, 01 May, 08:30–12:30
 
Hall X4, X4.79
Building the Copernicus Data Space Ecosystem STAC Catalog: Methodologies, Optimizations, and Community Impact
Marcin Niemyjski and Jan Musiał
Marcin Niemyjski and Jan Musiał
  • CloudFerro S.A., Data Science, Warszawa, Poland (mniemyjski@cloudferro.com)

The Copernicus Program is the largest and most successful public space program globally. It provides continuous data across various spectral ranges, with an archive exceeding 84 petabytes and a daily growth of approximately 20 TB, both of which are expected to increase further. The openness of its data has contributed to the widespread use of Earth observation and the development of commercial products utilizing open data in Europe and worldwide. The entire archive, along with cloud-based data processing capabilities, is available free of charge through the Copernicus Data Space Ecosystem initiative and continues to evolve to meet global user standards. 

This paper presents the process of creating the STAC Copernicus Data Space Ecosystem catalog—the largest and most comprehensive STAC catalog in terms of metadata globally. It details the workflow, starting from the development of a metadata model for Sentinel data, through efficient indexing based on the original metadata files accompanying the products, to result validation and backend system ingestion (via database DSN). A particular highlight is that this entire process is executed using a single tool, eometadatatool, initially developed by DLR, further enhanced, and released as open-source software by the CloudFerro team. The eometadatatool facilitates metadata extraction from the original files accompanying Copernicus program products and others (e.g., Landsat, Copernicus Contributing Missions) using a CSV file containing the metadata name, the file in which it occurs, and the path to the key within the file. Since the CDSE repository operates as an S3 resource offering users free access, the tool supports product access via S3 resources by default, configurable through environment variables. All the above characterizes eometadatatool as the most powerful stactool (a high-level command-line tool and Python library for working with STAC) package available, providing both valid STAC items and a method for uploading them to the selected backend. 

The standard specification itself has been influenced by the CDSE catalog development process, which contributed to the evolution of the standard by introducing version 1.1 and updated extensions (storage, eo, proj) that better meet user needs. The paper discusses the most significant modifications, their impact on the catalog’s functionality, and outlines the main differences. 

Particular attention is given to performance optimization due to the substantial data volume and high update frequency. The study examines the configuration and performance testing (using Locust) of the frontend layer (stac-fastapi-pgstac) and backend (pgstac). The stac-fastapi-pgstac implementation was deployed on a scalable Kubernetes cluster and underwent a product hydration process (specific to managing JSON data in pgstac), leveraging Python's native capabilities for this task. The pgstac schema was deployed on a dedicated bare-metal server with a PostgreSQL database, utilizing master-worker replication enabled through appropriate pgstac configuration. Both software tools are open source, and the achieved optimal configurations are documented and will be presented in detail. 

The presented solution empowers the community to fully utilize the new catalog, leverage its functionalities, and access open tools that enable independent construction of STAC catalogs compliant with ESA and community recommendations. 

How to cite: Niemyjski, M. and Musiał, J.: Building the Copernicus Data Space Ecosystem STAC Catalog: Methodologies, Optimizations, and Community Impact, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-17171, https://doi.org/10.5194/egusphere-egu25-17171, 2025.