EGU24-11115, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-11115
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Destination Earth Data Lake unlocking Big Earth Data processing

Danaele Puechmaille, Michael Schick, Borys Saulyak, Martin Dillmann, and Lothar Wolf
Danaele Puechmaille et al.
  • EUMETSAT, Digital Solutions and SAF, Germany (danaele.puechmaille@eumetsat.int)

The European Commission’s Destination Earth (DestinE) initiative will deploy several highly accurate thematic digital replicas of the Earth (Digital Twins) for monitoring and simulating natural and human activities, as well as their interactions. This will enable end-users and policy makers to execute “what-if” scenarios for assessing both the impact of environmental challenges (weather extremes, climate change etc.) and the efficiency of proposed solutions. DestinE is implemented in a strategic partnership between the European Space Agency (ESA), the European Centre for Medium-Range Weather Forecasts (ECMWF) and the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT).

Data Lake is one of the three components of DestinE system. The DestinE Data Lake must tackle several technical challenges. Firstly, the unprecedented volumes of data generated on a frequent basis within the scope of DestinE call for novel and efficient data access and near-data processing services, beyond the traditional “data-to-the-user” paradigm (in which users must download a multitude of files locally, extracting the required parts e.g. variables, area-of-interest etc. and afterwards using them as inputs in their algorithms).

Secondly, the DestinE Data Lake must handle a wide variety of data. In order to offer users a uniform interface to all the data they need for their applications, the DestinE Data Lake must provide access not only to the challenging volumes of Digital Twin outputs but also to federated data from various existing and upcoming data spaces, beyond traditional Earth Observation. This is managed via a user-driven data portfolio and fulfilled by a harmonised data access layer that abstracts away the heterogeneity and complexity of the underlying data sources.

Thirdly, the intense processing requirements of DestinE Digital Twins are fulfilled by hosting them on European High-Performance Computing (EuroHPC) sites. Data produced by the Digital Twins (DTs) must be processed where produced, at the edge of the DestinE Data Lake. This is achieved having defined a reference architecture, geographically distributed, with cloud stacks deployed in close proximity with the HPCs, for efficient data exchange.

Last but not least, DestinE follows a user-centric approach, evolving in response to on-boarded use cases. This requires a flexible architecture and user-driven data portfolio/ services, which can easily evolve to emerging user needs, incorporate new services, workflows and data sources, including future Digital Twins.

How to cite: Puechmaille, D., Schick, M., Saulyak, B., Dillmann, M., and Wolf, L.: Destination Earth Data Lake unlocking Big Earth Data processing, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11115, https://doi.org/10.5194/egusphere-egu24-11115, 2024.