- European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT), Darmstadt, Germany (firstname.lastname@eumetsat.int)
Destination Earth (DestinE) is European Commission’s initiative to gradually develop highly accurate Digital Twins (DT)s of the Earth with unprecedented accuracy and resolution. DestinE will initially provide DTs for adapting to climate change, forecasting extreme events and interactive use of high-resolution climate data. Insights from these models support scientists and policymakers to study and plan for future weather- and climate-induced events.
Stakeholders implementing what-if scenarios and/or ready to use applications on DestinE require the optimum storage and the seamless provision of access to a sheer volume of heterogeneous data often available from different data origins. EUMETSAT has implemented the DestinE Data Lake (DEDL) to address the above challenges. The DEDL offers the Harmonised Data Access (HDA) service that enables access to diverse data from the DEDL data portfolio via a unified STAC API. Furthermore, it offers, for power users, DEDL edge services on request, which are a dynamic suite of distributed big data processing components that operate close to DestinE’s massive data repositories. The edge services offered are: STACK (DEDL-managed software applications such as JupyterHub, DASK and Open Data Cube), ISLET (project-managed compute and storage services such as configurable virtual machines and S3 object storage) and HOOK (schedule and run pre-defined or user-defined high-level workflows, such as setting up a data processing pipeline).
To efficiently exploit the wealth of data available on DestinE, DEDL edge services will extend their abilities to accommodate the necessary infrastructure and software to enable Artificial Intelligence/Machine Learning (AI/ML) activities. DEDL will offer an ML Operations (MLOps) service tailored to Earth Observation (EO) data, which allows users to engage in various steps of AI/ML such as data preprocessing, model training and evaluation, experiment tracking, model deployment, model inference and monitoring. The modularized DEDL MLOps architecture will allow the users to use components as required without the need to be bound to pre-defined workflows and pipelines. The users, furthermore, can develop their AI/ML algorithms according to CI/CD best practices and have multiple environments for development, staging and production.
A specific focus of DEDL will be to define and work with highly flexible data pipelines. The framework will allow to convert DestinE data portfolio datasets to AI-ready formats, which can readily be used as inputs for various AI/ML models. The framework will have the capability to combine and harmonise data from various sources and formats and provides typical EO-based pre-processing steps such as data collocation, re-projection, and re-gridding among other operations.
This presentation will highlight the AI/ML and MLOps capabilities of the DEDL, demonstrating how they empower users to efficiently analyse data and derive valuable insights. By seamlessly integrating with DestinE’s data ecosystem, these advancements enable users to focus on innovation and address critical challenges such as climate adaptation and extreme event forecasting, rather than on managing complex workflows or infrastructure.
How to cite: Montazeri, S., Stoicescu, M., Hinojo Comellas, O., Puechmaille, D., and Schick, M.: MLOps on DestinE Data Lake – Towards Reproducible AI on Edge Services, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10817, https://doi.org/10.5194/egusphere-egu25-10817, 2025.