EGU23-12851
https://doi.org/10.5194/egusphere-egu23-12851
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

From the Copernicus satellite data to an environmentally aware field decision

Fabien Castel and Emma Rizzi
Fabien Castel and Emma Rizzi
  • Murmuration, France (fabien.castel@murmuration-sas.com)

Tackling complex environmental issues requires accessing and processing a wide range of voluminous data. The Copernicus spatial data is a very complete and valuable source for many earth science domains, in particular thanks to its Core Services (Land, Atmosphere, Marine…). For almost five years now, Copernicus DIAS platforms have provided broad access to the core services products through the cloud. Among these platforms, the Wekeo platform operated by EUMETSAT, Mercator Ocean, ECMWF and EEA provides wider access to Copernicus Core Service data.

However, Copernicus data needs an additional layer of processing and preparation to be presented and understood by the general public and decision makers. Murmuration has developed data processing pipelines to produce environmental indicators from Copernicus data constituting powerful tools to put environmental issues at the centre of decision-making processes.

Throughout its use, limitations on the DIAS platforms were observed. Firstly, the cloud service offerings are basic in comparison to the market leaders (such as AWS and GCP). In particular, there is no built-in solution for automating and managing data processing pipelines, which must be set up at the user's expense. Secondly, the cost of resources is higher than market price. Limiting the activities on DIAS to edge data processing and relying on a cheaper offering for applications not requiring the direct access to raw Copernicus data is a cost effective choice.  FInally, the performance and reliability requirements to access the data can sometimes not be met when relying on a single DIAS platform. Implementing a multi-DIAS approach ensures backup data sources. This raises the question of the automation and orchestration of such a multi-cloud system.

We propose an approach combining the wide data offer of the DIAS platforms, the automation features provided by the Prefect platform and the usage of efficient cloud technologies to build a repository of environmental indicators. Prefect is a hybrid orchestration platform dedicated to automation of data processing flows. It does not host any data processing flow itself and rather connects in a cloud-agnostic way to any cloud environment, where periodic and triggered flow executions can be scheduled. Prefect centrally controls flows that run on different cloud environments through a single platform.

Technologies leveraged to build the system allow to efficiently produce and disseminate the environmental indicators: firstly, containerisation and clustering (using Docker and Kubernetes) to manage processing resources; secondly object storage combined with cloud native access (Zarr data format); and finally, the Python scientific software stack (including pandas, scikit-learn, etc.) complemented by the powerful Xarray library. Data processing pipelines ensure a path from the NetCDF Copernicus Core Services products to cloud-native Zarr products. The Zarr format allows windowed read/write operations, avoiding unnecessary data transfers. This efficient data access allows plugging into the data repository fast data dissemination services following well-established OGC standards and feeding interactive dashboards for decision makers. The cycle is complete, from the Copernicus satellite data to an environmentally aware field decision.

How to cite: Castel, F. and Rizzi, E.: From the Copernicus satellite data to an environmentally aware field decision, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-12851, https://doi.org/10.5194/egusphere-egu23-12851, 2023.

Supplementary materials

Supplementary material file