EGU25-12349, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-12349
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Monday, 28 Apr, 17:15–17:25 (CEST)
 
Room -2.92
How Destination Earth Data Lake support Destination Earth users
Patryk Grzybowski1, Marcin Ziółkowski1, Aubin Lambare2, Christoph Reimer3, and Michael Schick4
Patryk Grzybowski et al.
  • 1CloudFerro S.A., Data Science, Warszawa, Poland (pgrzybowski@cloudferro.com)
  • 2CS Group - Sopra Steria, Le Plessis Robinson, France (aubin.lambare@cs-soprasteria.com)
  • 3EODC, Vienna, Austria (christoph.reimer@eodc.eu)
  • 44EUMETSAT, Darmstadt, Germany (michael.schick@eumetsat.int)

Destination Earth (DestinE) is a flagship initiative led by the European Commission, implemented by EUMETSAT, ESA and ECMWF. It aims to create highly detailed Digital Twins (DTs) of the Earth, enabling precise simulations for a variety of uses. Currently, the initiative focuses on two primary Digital Twins:  the Weather Extremes Digital Twin (ExtremeDT) and the Climate Change Adaptation Digital Twin (ClimateDT). Over the coming years, the scope of DTs is set to expand, necessitating improved access to data and streamlined methods for working with it. This is where the Destination Earth Data Lake (DEDL) plays a pivotal role, offering comprehensive data discovery, access, and processing services tailored to the needs of DestinE users.

The DEDL operates on two key levels: ‘Data Discovery and Access’ and ‘Edge Services’. DEDL Discovery and Data Access services is provided by Harmonized Data Access (HDA) tool which provides a single, federated entry point to the services and data, including resources from existing datasets and complementary sources such as in-situ and socio-economic data. Notably, it also provides access to the unique datasets generated by DestinE’s DT’s. The services rely on use of the SpatioTemporal Asset Catalogs (STAC) standard which means:

  • The search in the dataset is done according to the STAC protocol;
  • The Federated Catalog search proxy component converts STAC queries into queries adapted to the underlying catalog and returns the results to the user in STAC format.

The cloud computing service is powered by the ISLET infrastructure, a distributed Infrastructure as a Service (IaaS) built on OpenStack. It allows users to manage virtual machines, s3 storage, and run advanced computations via a graphical user interface or command-line interface. A standout feature of ISLET is its proximity to data sources, operating near High-Performance Computing (HPC) facilities. This is achieved through data bridges, enabling efficient processing of large datasets, including those from Digital Twins, in conjunction with HPC systems.

The STACK environment supports application development using JupyterHub and DASK, with Python, and R languages. Users can create DASK clusters on selected infrastructure (sites) to process data directly where it resides, removing the need for extensive local setup and optimization.

Hook Services is a set of pre-defined workflows which could be used by users as a ready-to-use processors like: Sentinel-2: MAJA Atmospheric Correction; Sentinel-1: Terrain-corrected backscatter. It also enables workflow functions to generate on-demand higher-level products, such as temporal composites.

DEDL is a transformative initiative that revolutionizes how Earth Observation data is managed and utilized. By integrating innovative infrastructure (ISLET), data services (HDA), reliable processors (Hook Services), and user-friendly development tools (STACK), DEDL enables unprecedented levels of data harmonization, federation, and processing. Moreover, the DEDL plays a crucial role in empowering DestinE users by providing them with seamless access to vast datasets and advanced computational tools. It simplifies the process of data exploration, integration, and analysis, enabling researchers, policymakers, and developers to focus on innovation and decision-making rather than technical barriers. This cutting-edge system enhances climate research capabilities and supports sustainable development efforts on a scale previously unattainable.

How to cite: Grzybowski, P., Ziółkowski, M., Lambare, A., Reimer, C., and Schick, M.: How Destination Earth Data Lake support Destination Earth users, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12349, https://doi.org/10.5194/egusphere-egu25-12349, 2025.