ESSI2.15 | Seamless transitioning between HPC and cloud in support of Earth Observation, Earth Modeling and community-driven Geoscience approach PANGEO
EDI
Seamless transitioning between HPC and cloud in support of Earth Observation, Earth Modeling and community-driven Geoscience approach PANGEO
Co-organized by AS5/CL5/GI1/OS5
Convener: Tina Odaka | Co-conveners: Vasileios Baousis, Anne Fouilloux, Stathes Hadjiefthymiades, Ross A. W. SlaterECSECS, Alejandro Coca-CastroECSECS
Orals
| Fri, 02 May, 14:00–15:43 (CEST), 16:15–18:00 (CEST)
 
Room -2.32
Posters on site
| Attendance Fri, 02 May, 10:45–12:30 (CEST) | Display Fri, 02 May, 08:30–12:30
 
Hall X4
Orals |
Fri, 14:00
Fri, 10:45
Cloud computing has emerged as a dominant paradigm, supporting industrial applications and academic research on an unprecedented scale. Despite its transformative potential, transitioning to the cloud continues to challenge organizations striving to leverage its capabilities for big data processing. Integrating cloud technologies with high-performance computing (HPC) unlocks powerful possibilities, particularly for computation-intensive AI/ML workloads. With innovations like GPUs, containerization, and microservice architectures, this convergence enables scalable solutions for Earth Observation (EO) and Earth System Modeling domains.
Pangeo (pangeo.io) represents a global, open-source community of researchers and developers collaborating to tackle big data challenges in geoscience. By leveraging a range of tools—from laptops to HPC and cloud infrastructure—the Pangeo ecosystem empowers researchers with an array of core packages, including Xarray, Dask, Jupyter, Zarr, Kerchunk, and Intake.
This session focuses on use cases involving both Cloud and HPC computing and showcasing applications of Pangeo’s core packages. The goal is to assess the current landscape and outline the steps needed to facilitate the broader adoption of cloud computing in Earth Observation and Earth Modeling data processing. We invite contributions that explore various cloud computing initiatives within these domains, including but not limited to:
This session aims to:
• Assess the current landscape and outline the steps needed to facilitate the broader adoption of cloud computing in Earth Observation and Earth Modeling data processing.
• Inspire researchers using or contributing to the Pangeo ecosystem to share their insights with the broader geoscience community and showcasenew applications of Pangeo tools addressing computational and data-intensive challenges.
We warmly welcome contributions that explore:
• Cloud Computing Initiatives: Federations, scalability, interoperability, multi-provenance data, security, privacy, and sustainable computing.
• Cloud Applications and Platforms: Development and deployment of IaaS, PaaS, SaaS, and XaaS solutions.
• Cloud-Native AI/ML Frameworks: Tools designed for AI/ML applications in EO and ESM.
• Operational Systems and Workflows: Cloud-based operational systems, data lakes, and storage solutions.
• HPC and Cloud Integration: Converging workloads to leverage the strengths of both computational paradigms.
In addition, we invite presentations showcasing applications of Pangeo’s core packages in:
• Atmosphere, Ocean, and Land Modeling
• Satellite Observations
• Machine Learning
• Cross-Domain Geoscience Challenges
This session emphasizes real-world use cases at the intersection of cloud and HPC computing. By sharing interactive workflows, reproducible research practices, and live executable notebooks, contributors can help map the current landscape and outline actionable pathways toward broader adoption of these transformative technologies in geoscience.

Orals: Fri, 2 May | Room -2.32

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Vasileios Baousis, Stathes Hadjiefthymiades, Tina Odaka
14:00–14:02
14:02–14:12
|
EGU25-10683
|
On-site presentation
Federico Fornari, Vasileios Baousis, Mohanad Albughdadi, Marica Antonacci, Tolga Kaprol, Claudio Pisa, Charalampos Andreou, Kakia Panagidi, and Stathes Hadjiefthymiades

The Copernicus program has fostered Earth Observation (EO) and Earth Modeling by offering extensive data and services to European Citizens. Sentinel satellites’ data is accessible  through platforms like the Copernicus Open Access Hub and the Copernicus Data Space Ecosystem, which provide a wide range of information on land, ocean and atmospheric conditions. Complementing these resources, six specialized Copernicus services deliver data in domains such as the atmosphere, marine environment, land monitoring, climate change, security and emergency response. To streamline access and usability, cloud-based Copernicus Data and Information Access Services (DIAS) offer centralised platforms equipped with cloud infrastructure and processing tools. Building on these efforts, the Copernicus Data Space Ecosystem (https://dataspace.copernicus.eu/) enhances existing DIAS services with advanced functionalities like improved search capabilities, virtualizations and APIs. Meanwhile, the Destination Earth (DestinE) initiative led by ECMWF, EUMETSAT and ESA, aims to develop high-precision digital Earth models - or digital twins - that simulate natural and human activities. These models mainly focus on weather-induced extremes and climate change adaptation, generating valuable Earth Modeling data. Furthermore, European Data Spaces integrate datasets across diverse domains, including agriculture, health, energy, and environmental monitoring, creating opportunities to combine these resources with Copernicus and DestinE data through advanced technologies like artificial intelligence (AI) and machine learning (ML). This integration paves the way for innovative solutions and public-facing products and services. Despite the volume and richness of Copernicus and related EO data, its accessibility remains limited, with most users being experts or scientists. For broader industry adoption and the development of impactful applications that benefit society and the enviroment, significant barriers must be addressed. EO data is often fragmented, complex, and difficult to process, requiring domain expertise for tasks such as data discovery, pre-processing, storage, and conversion into formats suitable for analytics and Geographic Information Systems (GIS).

The EO4EU platform (https://www.eo4eu.eu/), showcased in this presentation, introduces a multi-cloud ecosystem designed for holistic management of EO data. Its primary objective is to bridge the gap between domain experts and end users, leveraging technological advancements to broaden the adoption of EO data across diverse markets. By enhancing the accessibility and usability of EO data, EO4EU supports market growth through advanced data modeling, dynamic annotation, and state-of-the-art processing, powered by European cloud infrastructures such as WEkEO/DIAS and CINECA. EO4EU provides a suite of innovative tools and methodologies to assist a wide range of users, from professionals and domain experts to general citizens, in benefiting from EO data. Its key features include:

  • Knowledge Graph-based Decision Making: Facilitates insightful feature extraction from diverse repositories, enabling a more comprehensive understanding of datasets.
  • AI/ML Marketplace: A centralized hub for AI & ML models, algorithms, techniques, and metadata.
  • Big Data Processing Engines: Optimized for cloud environments to efficiently manage large-scale datasets.
  • User-friendly Interfaces: GUI, CLI, APIs, and immersive VR experiences, targeting both technical and non-technical users.
  • Workflow Engine: Simplifies the definition and execution of recurring tasks for EO data retrieval and processing.

How to cite: Fornari, F., Baousis, V., Albughdadi, M., Antonacci, M., Kaprol, T., Pisa, C., Andreou, C., Panagidi, K., and Hadjiefthymiades, S.: Copernicus data and services uptake with EO4EU platform: an AI-augmented ecosystem for Earth Observation data accessibility and exploitation., EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10683, https://doi.org/10.5194/egusphere-egu25-10683, 2025.

14:12–14:22
|
EGU25-11810
|
On-site presentation
Antonis Troumpoukis, Mohanad Albughdadi, Martin Welß, Vasileios Baousis, and Iraklis Klampanos

The DeployAI project [1] designs and delivers a fully operational European AI-on-Demand Platform (AIoDP) to empower the European industry with access to cutting-edge AI technology, and to promote trustworthy, ethical, and transparent European AI solutions, with a focus on SMEs and the public sector. To achieve this, the platform enables the development and deployment of AI solutions through the following core solutions: (i) AI Builder [2], which allows the assembling of reusable AI modules into AI pipelines; (ii) seamless access to Cloud and HPC infrastructures (e.g., MeluXina and LUMI); (iii) a marketplace for the listing and distribution of ready-to-use AI products; (iv) an expansive and growing library of diverse AI-driven use cases.

As part of its domain-driven solutions, AIoDP seeks to empower Environmental Scientists, AI Engineers, Developers, Researchers, and SMEs via the DeployAI Earth Observation Services. These services will accelerate the development of AI-driven environmental applications, by providing pre-trained models that simplify satellite imagery processing, land usage classification, and image segmentation. Key models available as modules within the DeployAI’s AI Builder include:

  • Leaf Area Index (LAI) Model: Enables precise monitoring of vegetation health and ecological dynamics by calculating leaf area per unit ground [3]. 
  • Object Detection Model: Identifies specific objects in high-resolution satellite images, supporting applications such as  infrastructure monitoring, pollution tracking, and deforestation assessment [4].
  • Segment Anything Model (SAM): Simplifies analysis across diverse environmental applications through the capabilities of SAM that allows flexible, prompt-based image segmentation for new datasets, with zero-shot and few-shot learning [5].

These models, along with the broader functionalities of AI Builder, enable users to create custom AI pipelines that address their specific environmental challenges in several environmental areas, including vegetation health monitoring, water balance analysis, climate modeling, urban planning, traffic management, pollution monitoring, and infrastructure maintenance. Users can leverage the visual pipeline editor to easily assemble pipelines from reusable AI modules without needing to write code. Once created, these pipelines can be deployed as AI applications on various execution environments. DeployAI facilitates seamless transitions between these environments by providing connectors to a host of target infrastructures, including Cloud platforms and HPC systems. This empowers users to leverage the most suitable computational resources for their specific needs.

By providing a user-friendly platform with access to cutting-edge AI technology and Cloud/HPC resources, DeployAI empowers users to address critical environmental challenges and unlock new possibilities for sustainable development.

[1] https://deployaiproject.eu
[2] https://gitlab.eclipse.org/eclipse/graphene
[3] https://github.com/DeployAI-Environmental-Services/depai-lai
[4] https://github.com/DeployAI-Environmental-Services/depai-yolov8-obb
[5] https://github.com/DeployAI-Environmental-Services/depai-sam-interactive

This work has received funding from the European Union’s Digital Europe Programme (DIGITAL) under grant agreement No 101146490.

How to cite: Troumpoukis, A., Albughdadi, M., Welß, M., Baousis, V., and Klampanos, I.: DeployAI Earth Observation Services: Enabling Environmental Insights on the European AI-on-Demand Platform, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-11810, https://doi.org/10.5194/egusphere-egu25-11810, 2025.

14:22–14:32
|
EGU25-8754
|
ECS
|
Virtual presentation
Georgios Charvalis, Panagiota Louka, Vassileios Gkoles,  Thanasis Manos, Nikos Kalatzis, Dionysios Solomos, Anastasios Trypitsidis, and Odysseas Sekkas

Cloud infrastructures play a significant role in delivering secure, scalable and efficient data processing for Earth Observation (EO) and agricultural management applications. As part of the ScaleAgData project, we present a hierarchical Agri-Environmental Monitoring Tool running on a private cloud infrastructure. The system combines data from EO, in-situ sensors and farm management information systems (FMIS), including parcel calendars, to provide farmers and policymakers multi-scale insights.  

The solution is cloud-based and designed with an underlying architecture that ensures both scalability and interoperability, leveraging OGC-compliant data formats where applicable. EO and in-situ data streams can be processed and analyzed efficiently with the help of containerized apps and microservices to facilitate modular development and simplify deployment. By using a web-based dashboard with hierarchical design, stakeholders can navigate from overviews at the municipal level to individual parcels. Aggregated summaries that comply with Common Agricultural Policy (CAP) criteria are useful to policymakers and farmers can get comprehensive parcel-level metrics to optimize irrigation, pesticide use and other agro-related activities.  

Specifically, the tool combines EO data to derive vegetation indices (e.g., NDVI, EVI) and other parameters requiring advanced processing for crop type classification. Furthermore, these datasets are enriched with in-situ sensor measurements (e.g. soil moisture, weather data) and farm logs managed within FMIS (irrigation schedule, pesticide usage). Parcel-level data (L1) is processed to generate statistics, which are then calibrated with nearby parcels data with similar properties and crop type(L2), serving as control level, and finally extrapolated to the municipal level (L3) using spatial averaging techniques  to provide indicators related to irrigation water, pesticide, fertilizer usage, etc.  Farm calendars stored within FMIS provide a reliable source of ground-truth data, enhancing the tool’s ability to validate aggregated metrics. The aggregation at L2 and L3 allows for the identification of regional trends and patterns in agricultural practices, empowering policymakers and stakeholders to implement targeted interventions at both levels, thereby promoting sustainable agriculture.   

This work showcases the potential of private cloud infrastructures to enhance agri-environmental monitoring by processing and integrating heterogeneous data streams (EO, in-situ sensors and farm log data) into a unified system. The system is being applied in diverse agricultural regions of Greece (Crete, Thessaly, Macedonia) with ongoing validation efforts aimed at refining its accuracy and adaptability. Future work includes the integration of cloud-based machine learning models and EO-derived evapotranspiration data to enhance the efficiency of extrapolating parcel-level (L1) and regional (L2) metrics into policy-level indicators (L3). Additionally, alternative aggregation methods, such as model-based approaches, spatial regression, and interpolation techniques like Kriging, will be tested to improve the accuracy and reliability of aggregated insights. 

How to cite: Charvalis, G., Louka, P., Gkoles, V., Manos,  ., Kalatzis, N., Solomos, D., Trypitsidis, A., and Sekkas, O.: Leveraging Cloud, Earth Observation and In-Situ Sensors for Agri-Environmental Monitoring and Policy Decision-Making , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-8754, https://doi.org/10.5194/egusphere-egu25-8754, 2025.

14:32–14:42
|
EGU25-15406
|
On-site presentation
Armagan Karatosun and Vasileios Baousis

The growing volume of Earth Observation (EO) and Earth modeling data makes it increasingly impractical to download and analyze it locally. Furthermore, as cloud-native data formats and AI/ML-driven models gain popularity, the community requires powerful computing and storage solutions to efficiently process and analyze EO data. High-performance computing (HPC) and cloud infrastructures can help accomplish this, but both bring significant challenges in maintaining those resources, putting additional workloads on the scientists and developers.

In this paper, we will present our solution, which uses cloud-native technologies and a “Control Plane” approach to seamlessly interact with HPC scheduling endpoints like SLURM and PBS, as well as cloud infrastructure resources, allowing HPC jobs to be submitted and monitored directly from a Kubernetes-based infrastructure. In contrast to traditional IT architecture, Platform Engineering is concerned with lowering operational complexity by introducing control planes to provide self-service capabilities. By abstracting away the complexities of the underlying infrastructure, this method gives teams a customized, scalable, and dependable environment to suit their unique requirements. We will thoroughly analyze existing technologies, including their methodologies, strengths, limits, and potential as universal solutions. Furthermore, we will assess their adaptation to various cloud and HPC infrastructures, providing insights into their suitability for larger applications. 

We will conclude our discussion with practical examples showing how the technical benefits of these two computing paradigms, combined with the Platform Engineering approach, may be effectively used in real-world EO data processing scenarios.

How to cite: Karatosun, A. and Baousis, V.: Platform Engineering for Earth Observation: A Unified Approach to HPC and Cloud Systems, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-15406, https://doi.org/10.5194/egusphere-egu25-15406, 2025.

14:42–14:52
|
EGU25-12070
|
ECS
|
On-site presentation
Sergi Palomas, Mario Acosta, Gladys Utrera, Okke Lennart, Daniel Beltran, Miguel Castrillo, Niclas Schroeter, and Ralf Mueller

The computational intensity of climate models makes them among the most energy-demanding applications in High-Performance Computing (HPC), resulting in significant computational costs and carbon emissions. Addressing the dual challenge of improving climate predictions —by running higher resolution, more accurate and complex models— and ensuring sustainability requires innovative tools to evaluate both computational efficiency and energy consumption across diverse HPC architectures. To address this, and in the context of the Center of Excellence in Simulation of Weather and Climate in Europe (ESiWACE), we have extended the High-Performance Climate and Weather Benchmark (HPCW) framework to incorporate a standardised set of Climate Performance Metrics for Intercomparison Projects (CPMIPs) and energy consumption monitoring.

HPCW, originally designed to maintain a set of relevant and realistic, near-operational weather forecast workloads to benchmark HPC sites, can provide insights beyond generic benchmarks like High-Performance Linpack (HPL) or High-Performance Conjugate Gradients (HPCG) by focusing on domain-specific workloads.

The inclusion of CPMIPs into HPCW brings a widely accepted set of metrics specifically tailored to the particularities of climate workflows. These metrics, already recognized by the scientific community, are key to better understanding climate model performance and allow us to keep the results from the framework relevant for research and operational runs, as well as improving our capacity for multi-model multi-platform performance comparisons.

By integrating energy monitoring, HPCW enables users to evaluate how critical computational kernels in climate models perform in terms of energy consumption. Our review of energy profiling tools across EuroHPC pre-exascale systems, including MareNostrum 5, LUMI, and Leonardo, highlights a fragmented landscape. Current tools offer varying granularity and portability, but limitations such as system configurations, administrative restrictions, and hardware compatibility often hinder their application. Low-level interfaces like Running Average Power Limit (RAPL) and Performance Application Programming Interface (PAPI) counters offer precise energy measurements but are constrained by accessibility issues.

These advancements aim to improve the allocation of climate experiments, such as those conducted for the Intergovernmental Panel on Climate Change (IPCC) in Coupled Model Intercomparison Projects (CMIPs), to the most suitable HPC resources, while also identifying architectural bottlenecks before running production experiments. Additionally, by enhancing energy consumption quantification, this work contributes to ongoing efforts to measure and reduce the carbon footprint of the climate research community. Furthermore, these analyses are expected to be particularly valuable for climate researchers, especially in the context of upcoming large-scale initiatives like CMIP7, enabling them to make informed resource requests and facilitate robust multi-platform comparisons of climate model performance which were not possible in the past. We anticipate that HPC vendors can also benefit from the outcomes of our work in optimising the systems for climate modelling workloads. By combining performance and energy metrics within a unified framework, we provide critical insights that align computational advancements with sustainability goals, ensuring efficient and environmentally conscious use of HPC resources for climate research.

How to cite: Palomas, S., Acosta, M., Utrera, G., Lennart, O., Beltran, D., Castrillo, M., Schroeter, N., and Mueller, R.: Performance Benchmarking and Energy monitoring for Climate Modelling, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12070, https://doi.org/10.5194/egusphere-egu25-12070, 2025.

14:52–14:53
14:53–15:03
|
EGU25-21202
|
On-site presentation
Christian Briese, Christoph Reimer, Christian Briese, Christoph Reck, Dimitrios Papadakis, Michele Claus, Gunnar Brandt, Anne Fouilloux, and Tina Odaka

Over the past decade, the operational Copernicus Sentinels Data Processors have generated vast amounts of Earth observation data, supporting various scientific and commercial applications. However, the current format used by ESA to provide Copernicus data, known as SAFE (Standard Archive Format for Europe), has become outdated. To address this, ESA has initiated the transition to a new Zarr-based data format. The Earth Observation Processing Framework (EOPF) Sample Service is ESA’s official initiative to support this transition by providing early access to the new format for users. This shift is essential for creating a cloud-native and interoperable solution that enhances data accessibility and integration with modern processing frameworks. The primary goal is to standardize data formats across Sentinel missions, enable scalable processing on cloud platforms, and ensure compatibility with contemporary data science tools. This initiative is crucial for minimizing disruption and ensuring continuity for users, applications, and services built around existing data formats.

The EOPF Sample Service comprises several key components. The EOPF Core Platform re-formats ingested SAFE data products into the new cloud-optimized EOPF Zarr data products and provides data access via STAC API and S3 API. To ensure timely conversion, the platform utilizes Argo Events and the Copernicus Data Space Ecosystem's subscription service. This platform is maintained by experts from EODC and DLR. The EOPF User Platform offers additional user services, including JupyterHub (BinderHub), Dask, and a STAC Browser, which are essential for supporting user adoption by lowering the entrance barrier to cloud applications and data discovery capabilities. The service is designed to make use of advanced technologies such as Kubernetes for container orchestration and Dask for parallel computing. User and identity management is achieved in cooperation with the Copernicus Data Space Ecosystem.

User adoption is further facilitated through Jupyter Notebooks designed by experts within the consortium, including members from the Pangeo community. These notebooks showcase the use of the new format within the community and are continuously improved by incorporating user feedback. In addition, enhancements are made to widely-used software tools like GDAL to support the new format, with practical demonstrations available through Jupyter Notebooks. The consortium selected by ESA to carry out this implementation includes experts from Brockmann Consult, DLR, Ifremer, EURAC, Evenflow, Simula, and EODC, each contributing their specialized knowledge in Earth observation, data management, and user engagement.

This contribution aims to present the EOPF Sample Service initiative and the current status of its implementation. The first Jupyter Notebooks demonstrating the new format will also be showcased, providing users with an intuitive and user-friendly interface for accessing and processing sample data in the new EOPF format.

How to cite: Briese, C., Reimer, C., Briese, C., Reck, C., Papadakis, D., Claus, M., Brandt, G., Fouilloux, A., and Odaka, T.: From SAFE to Zarr: The EOPF Sample Service Initiative, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-21202, https://doi.org/10.5194/egusphere-egu25-21202, 2025.

15:03–15:13
|
EGU25-17137
|
On-site presentation
Dr. Julia Wagemann, Sabrina Szeto, Emmanuel Mathot, and James Banting

Zarr is a key component of the Pangeo ecosystem and instrumental for effectively accessing and processing multi-dimensional Earth data in cloud-based systems. More and more leading satellite data providers are exploring the transition of their data archives to a cloud environment. 

As part of the ESA Copernicus Earth Observation Processor Framework (EOPF), ESA is in the process of providing access to “live” sample data from the Copernicus Sentinel missions -1, -2 and -3 in the new Zarr data format. This set of reprocessed data allows users to try out accessing and processing data in the new format and experiencing the benefits thereof with their own workflows.

To help Sentinel data users to experience and adopt the new data format, a set of resources called the Sentinels EOPF Toolkit is being developed. Development Seed, SparkGeo and thriveGEO, together with a group of champion users (early-adopters), are creating a set of Jupyter Notebooks, plug-ins and libraries that showcase the use of Sentinel data in Zarr for applications across multiple domains for different user communities, including users of Python, Julia, R and QGIS.

This presentation will give a demo of the first set of notebooks and plugins of the Sentinels EOPF toolkit that were developed and that facilitate the adoption of the Zarr data format for Copernicus Sentinel data users. Additionally, we will give an overview of toolkit developments and community activities that are planned throughout the project period.

How to cite: Wagemann, Dr. J., Szeto, S., Mathot, E., and Banting, J.: The Sentinels EOPF Toolkit: Community Notebooks and Plug-ins for using Copernicus Sentinel Data in Zarr format, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-17137, https://doi.org/10.5194/egusphere-egu25-17137, 2025.

15:13–15:23
|
EGU25-21279
|
On-site presentation
Deyan Samardzhiev, Anne Fouilloux, Tina Odaka, and Benjamin Ragan-Kelley

EarthCODE (Earth Science Collaborative Open Development Environment) is a platform that leverages cloud-native tools to empower Earth system researchers in accessing, analyzing, and sharing data across distributed infrastructures, such as the Copernicus Data Space Ecosystem and Deep Earth System Data Laboratory (DeepESDL). By integrating Pangeo ecosystem tools—including Xarray, Dask, and Jupyter—EarthCODE supports scalable, FAIR-aligned workflows tailored to the challenges of Earth system science.

EarthCODE streamlines cloud-based data analysis and publishing by enabling collaborative research through interoperable workflows for analyzing complex datasets, including satellite observations, climate models, and in-situ measurements. Researchers can publish their analyses and workflows as reusable, executable resources in EarthCODE’s science catalog, fostering alignment with open science principles.

Through its integration of Pangeo tools, EarthCODE offers an intuitive environment for reproducibility, scalability, and collaboration, bridging the gap between data analysis and actionable insights. This presentation will demonstrate EarthCODE’s capabilities, including live, executable Jupyter notebooks that highlight its potential for sharing workflows and engaging diverse user groups. EarthCODE exemplifies the transformative power of cloud-native research, promoting open science and advancing the accessibility of Earth system data.

How to cite: Samardzhiev, D., Fouilloux, A., Odaka, T., and Ragan-Kelley, B.: Advancing Cloud-Native Data Analysis and Publishing with Pangeo Tools in EarthCODE, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-21279, https://doi.org/10.5194/egusphere-egu25-21279, 2025.

15:23–15:33
|
EGU25-14400
|
ECS
|
On-site presentation
Max Jones, Aimee Barciauskas, Jonas Sølvsteen, Brian Freitag, Yuvi Panda, Kyle Barron, Julia Signell, Alex Mandel, Chuck Daniels, Nathan Zimmerman, Sean Harkins, Henry Rodman, Zac Deziel, Slesa Adhikari, Anthony Boyd, Alexandra Kirk, David Bitner, and Vincent Sarago

To enable wider participation in open science with geospatial data at scale, we need to reduce the effort and custom approaches required for setting up scalable scientific data analysis environments and computing workflows. We have made great strides in this pursuit by evolving and promoting community-developed open source frameworks, tools, and libraries for cloud-native data access and analysis, making them the default for scientists on the public cloud and local systems.

Many of our achievements have been supported by the NASA Visualization, Exploration, and Data Analysis (VEDA) project which seeks to proliferate cloud-native approaches for open science on Earth science data from NASA’s rich archives and many other providers. Our presentation highlights how we have engaged with communities like Pangeo, OpenScapes, Earth Science Information Partners, and the Cloud Native Geospatial Forum to build joint initiatives, target development, and ensure uptake of new solutions. We present key results from working groups, community showcases, and hackdays and hackweeks organized by VEDA team members, as well as specific contributions to the open source ecosystem, including the eoAPI platform for quickly and easily deploying an open-source Earth Observation stack, JupyterHub fancy profiles (with BinderHub) for seamless environment building, and Lonboard for fast, interactive vector visualization.

How to cite: Jones, M., Barciauskas, A., Sølvsteen, J., Freitag, B., Panda, Y., Barron, K., Signell, J., Mandel, A., Daniels, C., Zimmerman, N., Harkins, S., Rodman, H., Deziel, Z., Adhikari, S., Boyd, A., Kirk, A., Bitner, D., and Sarago, V.: A community oriented approach to enabling open science with Earth science data at scale, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-14400, https://doi.org/10.5194/egusphere-egu25-14400, 2025.

15:33–15:43
Coffee break
Chairpersons: Ross A. W. Slater, Anne Fouilloux
16:15–16:25
|
EGU25-20676
|
On-site presentation
Scott Henderson, David Shean, Jack Hayes, and Shashank Bhushan

NASA established the Surface Topography and Vegetation (STV) Incubation program to develop and mature the next-generation measurement approaches to precisely map Earth’s changing surface and overlying vegetation structure, and prepare for a dedicated satellite mission within the next decade. Over the past two decades, large archives of 3D surface elevation measurements by airborne and satellite instruments including LiDAR, altimeters, Synthetic Aperture Radar, and stereo optical imagery have been systematically collected, though not always in a coordinated way. Yet, many of these datasets are fortuitously acquired over the same location within a short temporal window (e.g., <1-14 days) and many are now publicly available and hosted on the cloud. In theory, this is a great opportunity to synthesize myriad elevation measurements for STV researchers, but in practice merging these datasets accurately for scientific analysis requires dealing with numerous data formats, complex 4D coordinate reference systems, and securing access to significant computational resources.

We are developing an open-source Python library to identify, curate, and efficiently process coincident elevation measurements spanning the last several decades. This work would not be possible without well-integrated geospatial libraries (e.g. Geopandas, Xarray, Dask), as well as emerging cloud-native data and metadata formats such as Cloud-Optimized Geotiff and STAC-GeoParquet. We will describe our work to-date and reflect on the process of collaborative development across libraries, on our increasing reliance on Cloud resources, and current and future research directions.

How to cite: Henderson, S., Shean, D., Hayes, J., and Bhushan, S.: Integrated geospatial Python libraries for efficient analysis of modern elevation measurements, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-20676, https://doi.org/10.5194/egusphere-egu25-20676, 2025.

16:25–16:35
|
EGU25-14610
|
Virtual presentation
Nishadh Kalladath, Masilin Gudoshava, Shruti Nath, Jason Kinyua, Fenwick Cooper, Hannah Kimani, David Koros, Christine Maswi, Zacharia Mwai, Asaminew Teshome, Samrawit Abebe, Isaac Obai, Jesse Mason, Ahmed Amdihun, and Tim Palmer

The Ensemble Prediction System (EPS) provided by global weather forecast centres generates vast amounts of data that is crucial for early warnings of extreme weather and climate. However, regional and national meteorological services often face challenges in processing this data efficiently, particularly during regional downscaling and post-processing. Conventional methods of downloading and storing GRIB-format data have become increasingly inefficient and unsustainable. The Strengthening Early Warning Systems for Anticipatory Actions (SEWAA) project aims to address these challenges by exploring the use of cloud native operations and GenAI-cGAN driven post-processing systems.   

Kerchunk provides a groundbreaking solution for real-time weather data streaming, catering to the transition towards open and free to use cloud-based object storage from global weather forecasting centres. Kerchunk, in conjunction with GRIB index files, enables efficient, real-time access to weather data, fostering more sustainable workflows in weather and climate services, thus strengthening early warning systems.  

This study developed a workflow for streaming forecast data using Kerchunk with two primary objectives:  

1. Using GRIB index files to reduce redundant readings and generate Kerchunk reference files.  

2. Through streaming-like access, convert the reference files into virtual Zarr datasets and utilise Dask compute for scalable data handling   

The methodology utilised recent improvements in the Kerchunk library that integrate GRIB scanning with its index files. This allowed the system to sample subsets of the GRIB corpus instead of processing entire Forecast Model Run Collections (FMRC), significantly optimising performance.  

The workflow was implemented using cloud-based compute operations via Coiled python library and its service on the Google Cloud Platform. Dask cluster, managed through Coiled, enabled the creation of Zarr virtual datasets for analysis and visualisation. This streaming approach efficiently loads weather data into memory on demand, avoiding unnecessary data downloads and duplication.   

We validated the solution with NOAA GFS/GEFS datasets stored in AWS S3 bucket as open datasets. The optimised workflow demonstrated remarkable efficiency, requiring only <5% of the original GRIB data to be read, with the rest replaced by index files as input for reference file creation. This is followed by the step of Kerchunk reference files to virtual Zarr conversion by Dask clusters to process on a regional scale, such as East Africa’s in minutes supporting near real-time applications across spatial and temporal scales.  

This approach significantly enhances post processing workflows for EPS weather forecast, bolstering early warning systems and anticipatory action. Future work will focus on using the method to scaling training datasets and improving the cost efficiency of cGAN training to advance operational early warning systems. This innovative solution directly addresses the challenges faced by meteorological services in processing massive weather datasets, providing a scalable, cost-effective, development foundation for applying GenAI based post-processing and improving early warning systems. 

How to cite: Kalladath, N., Gudoshava, M., Nath, S., Kinyua, J., Cooper, F., Kimani, H., Koros, D., Maswi, C., Mwai, Z., Teshome, A., Abebe, S., Obai, I., Mason, J., Amdihun, A., and Palmer, T.: Weather Data Streaming with Kerchunk: Strengthening Early Warning Systems , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-14610, https://doi.org/10.5194/egusphere-egu25-14610, 2025.

16:35–16:45
|
EGU25-8591
|
On-site presentation
Fabian Wachsmann

We present recent progress around the EERIE cloud data server (https://eerie.cloud.dkrz.de) and its software stack “cloudify”. The EERIE cloud provides efficient open access to prominent climate datasets stored on disk at the German Climate Computing Center (DKRZ).

A new kerchunk-plugin enables data access to raw model output as-is to enable verifiable data transfer with better latency. STAC (Spatio Temporal Assets Catalog) catalogs are deployed and displayed through the EERIE cloud to make the provided DKRZ datasets findable and accessible. Two in-browser apps can be started, pre-configured for each dataset, by just clicking buttons: (1) the data visualization app “gridlook” as well as a (2) jupyterlite for interactive analysis and monitoring. 

We leverage the python package xpublish, a plugin for Pangeo's central analysis package Xarray. Its main feature is to provide ESM output by mapping any input data to virtual zarr datasets. Users can retrieve these datasets as if they were cloud-native and cloud-optimized.

How to cite: Wachsmann, F.: The EERIE cloud: Apps and Catalogs for Cloudified Earth System Model Output, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-8591, https://doi.org/10.5194/egusphere-egu25-8591, 2025.

16:45–16:47
16:47–16:57
|
EGU25-13873
|
On-site presentation
John Clyne, Hongyu Chen, Philip Chmielowiec, Orhan Eroglu, Cecile Hannay, Robert Jacob, Rajeev Jain, Brian Medeiros, Paul Ullrich, and Colin Zarzycki

Over the past decade, weather and climate models have rapidly adopted unstructured meshes to better leverage high-performance computing systems and approach kilometer-scale resolutions. Output from this new generation of models presents many challenges for their subsequent analysis, largely due to a lack of community tools supporting unstructured grid data. Last year, we introduced UXarray, a class extension of Xarray that provides native support for unstructured meshes. UXarray readily runs in a Jupyter Notebook and offers parallelized execution through its compatibility with Dask, demonstrating its flexibility as both a tool for lightweight exploration and communication, and for supporting intensive calculations applied to vast data volumes. Over the past year, UXarray has matured significantly and is now capable of supporting many real-world analysis workflows applied to outputs from a growing number of high-resolution models and dynamical cores, including ICOsahedral Non-hydrostatic (ICON) atmosphere model, the Finite-Element/volumE Sea ice-Ocean Model (FESOM), NSF NCAR’s Model for Prediction Across Scales (MPAS), and the U.S. DOE’s Energy Exascale Earth System Model (E3SM). This presentation will provide an overview of the UXarray’s current capabilities, which include extensive support for plotting and many foundational analysis operators; demonstrate examples in Jupyter Notebooks; present plans for the future;  and discuss ways for Pangeo and the broader earth system science community to help guide new developments. 

How to cite: Clyne, J., Chen, H., Chmielowiec, P., Eroglu, O., Hannay, C., Jacob, R., Jain, R., Medeiros, B., Ullrich, P., and Zarzycki, C.: UXarray: Extending Xarray for Enhanced Support of Unstructured Grids, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13873, https://doi.org/10.5194/egusphere-egu25-13873, 2025.

16:57–17:07
|
EGU25-14306
|
ECS
|
On-site presentation
Kayziel Martinez, Alexander Kmoch, Lőrinc Mészáros, Andrew Nelson, and Evelyn Uuemaa

Accurate and efficient spatial analysis is crucial for the mapping and sustainable management of marine environments, where large-scale and diverse datasets present significant analytical challenges. Traditional latitude-longitude methods, while widely used, often encounter limitations in data integration and handling distortion caused by Earth’s curvature. Discrete Global Grid Systems (DGGS) have emerged as a promising solution, offering a hierarchical, global, and equal-area framework for geospatial analysis. Despite their potential, the performance in marine spatial analysis remains underexplored.

This study evaluates the impact and suitability of DGGS-based spatial analysis by comparing its performance with the traditional latitude-longitude approaches. Using marine datasets representing point and raster data formats, the workflow begins with quantization, converting the data into DGGS cells.The implementation utilizes open-source Python tools from the Pangeo ecosystem, including xarray-xdggrid, to enable seamless integration and efficient analysis of large geospatial datasets. Three DGGS configurations – ISEA7H, HEALPIX, and ISEA3H are compared alongside traditional latitude-longitude grid for computational efficiency (processing time and memory usage) and their ability to preserve spatial patterns. Spatial analysis methods include density estimation, nearest neighbor evaluation, and clustering for point data, as well as zonal statistics, spatial autocorrelation, and resampling for raster data.

To further illustrate the application of DGGS-based methods, the study includes a case study on estuary characterization. This characterization relies on spatial analysis methods, integrating physical oceanographic parameters from Delft3D-FM, biogeochemical and optical data products, and in-situ point measurements from the Copernicus Marine Environment Monitoring Service (CMEMS). Representing these diverse datasets within the DGGS framework highlights its ability to manage varying data types and scales, offering insights into estuarine environments and demonstrating its scalability for addressing complex marine spatial challenges.

Results indicate that DGGS frameworks deliver comparable computational performance while offering consistent spatial representation. Configuration-specific trade-offs influence their effectiveness, emphasizing the importance of aligning DGGS configurations with specific analytical tasks and applications. Findings suggest that DGGS-based methods offer a promising alternative to traditional analysis techniques, providing greater flexibility in adapting to datasets, scale, and resolution. This contributes to more efficient mapping, sustainable marine environmental management, and advancing geospatial applications through open-source tools from the Pangeo ecosystem.

How to cite: Martinez, K., Kmoch, A., Mészáros, L., Nelson, A., and Uuemaa, E.: Navigating New Grids: Evaluating DGGS Configurations for Marine Spatial Analysis, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-14306, https://doi.org/10.5194/egusphere-egu25-14306, 2025.

17:07–17:17
|
EGU25-21603
|
ECS
|
Highlight
|
On-site presentation
Even Moa Myklebust, Ola Formo Kihle, and Justus Magin

The RiOMar (River dominated Ocean Margins) case study, part of the FAIR2Adapt (FAIR to Adapt to Climate Change) project (EU funded project grant agreement No 101188256), focuses on supporting science-based climate change adaptation strategies for coastal water quality and marine ecosystem management. The case study uses large environmental datasets, such as sea temperature, salinity, and other marine parameters, to assess and model the impacts of climate change on coastal ecosystems. As part of the FAIR2Adapt project, which aims to enhance the shareability, accessibility, interoperability, and reusability of environmental data through the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, the RiOMar case study emphasizes the use of cutting-edge data processing and analysis methods to support adaptive strategies for climate resilience.

In this presentation, we present our approach to reading the RiOMar large environmental datasets in netCDF format, creating VirtualZarr archives for efficient data handling, transforming them into a Discrete Global Grid System (DGGS) using the Healpix grid.Leveraging the Pangeo ecosystem, we use tools such as Kerchunk to create simpler access to multiple data sources, parallelize dataset processing using Dask or Cube, enabling scalable analysis of these complex, multi-dimensional data. We will show a comparison of performance between traditional cube-based approaches and Dask, highlighting the advantages of parallelized processing. Furthermore, we will showcase how to interactively visualize these datasets using tools like XDGGs and Lonboard, facilitating seamless exploration and analysis of the underlying environmental patterns. This work underscores the potential of open-source tools, scalable computing techniques, and the Pangeo ecosystem to enhance the accessibility and usability of large geospatial datasets in climate adaptation research.

How to cite: Moa Myklebust, E., Formo Kihle, O., and Magin, J.: The Pangeo Ecosystem Supporting Climate Change Adaptation: The FAIR2Adapt RiOMar Case Study, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-21603, https://doi.org/10.5194/egusphere-egu25-21603, 2025.

17:17–17:27
|
EGU25-12918
|
ECS
|
On-site presentation
Etienne Cap, Tina Odaka, Jean-Marc Delouis, Justus Magin, and Mathieu Woillez

The Pangeo-Fish project processes biologging data to analyze fish movement and migration patterns.  While SciPy’s convolution methods are robust, they are not optimized for handling spherical datasets inherent to Earth system science. To address this limitation, we propose the integration of HEALPix convolution, a method designed for spherical operations, into Pangeo-Fish.

HEALPix convolution offers distinct advantages for geophysical data analysis, particularly when dealing with spherical datasets in Earth system science. It uses the HEALPix pixelization as a core Discrete Global Grid System (DGGS), which ensures equally-sized pixels globally, removing distortions common in flat projections. This consistency is crucial for maintaining the physical relevance of convolutions across locations. Additionally, HEALPix’s dyadic property supports flexible, multiscale resolution adjustments, allowing for downscaling while preserving accuracy. Such scalability is essential for studying oceanic environments where areas of interest, like coastal zones and basins, are often resolution-dependent.

Our approach evaluates the performance of HEALPix convolution in comparison to traditional SciPy methods, focusing on its ability to enhance the accuracy of habitat mapping and migration pathway modeling for fish. 

This integration is particularly relevant within the Global Fish Tracking System (GFTS), which operates under the European Union’s Destination Earth (DestinE) initiative. GFTS utilizes datasets from Copernicus Marine Services and the European Tracking Network (ETN) to model fish habitats, spawning grounds, and migration swimways. HEALPix convolution strengthens the pangeo-fish’s capacity for studying Species such as tuna and eel that exhibit large-scale, transoceanic migrations.    

In conclusion, this work highlights the transformative potential of HEALPix convolution in spherical data processing. By integrating this innovative method, Pangeo-Fish can provide more accurate, scalable, and actionable insights into fish behaviors and habitats, contributing to sustainable management practices and conservation strategies globally.

 

How to cite: Cap, E., Odaka, T., Delouis, J.-M., Magin, J., and Woillez, M.: Enhancing Pangeo-Fish with HEALPix Convolution: Impact Evaluation and Benefits, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12918, https://doi.org/10.5194/egusphere-egu25-12918, 2025.

17:27–17:37
|
EGU25-6798
|
On-site presentation
Jean-Marc Delouis, Erwan Allys, Justus Mangin, Louise Mousset, and Tina Odaka

A significant challenge in data integration and ML methodologies on cloud infrastructures is accurately determining correlated statistics. Initially, aligning data to a consistent pixel grid is essential, motivating the use of Discrete Global Grid Systems (DGGS). In geophysical studies, data reside on a sphere, and approximating with tangent planes can distort results. Our solution is the HEALPix pixelization as our DGGS framework, standardizing data on a common grid for consistent statistical analysis. HEALPix's unique features, such as its iso-latitude layout and uniform pixel areas, enable the use of spin-weighted spherical harmonics in managing vector fields. This enables the accurate calculation of  correlation statistics, such as between velocity and scalar fields on the sphere, while minimizing biases due to spherical approximations. By utilizing the HEALPix framework, known in cosmology, with TensorFlow or PyTorch as backends, we created the: HEALML library. This library facilitates gradient computations of all derived statistics for AI optimization, and has been validated on the Pangeo-EOSC platform. This library parallelizes the computation of localized spherical harmonics and includes features like scattering covariance calculations, allowing the extraction of more complex nonlinear statistics beyond the power spectrum. We compare these results to traditional 2D planar methods, demonstrating the advantages of sphere-based statistics on platforms like Pangeo-EOSC. Furthermore, we demonstrate: HEALML's ability to emulate using a substantially smaller dataset. This demonstration emphasizes the ways in which incorporating spherical statistical methods into Pangeo-EOSC fosters innovative and efficient statistical analysis within geophysical research.

How to cite: Delouis, J.-M., Allys, E., Mangin, J., Mousset, L., and Odaka, T.: Advancing Geophysical Data Analysis: HEALML for Efficient Sphere-Based Statistics on Pangeo-EOSC, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-6798, https://doi.org/10.5194/egusphere-egu25-6798, 2025.

17:37–17:47
17:47–18:00

Posters on site: Fri, 2 May, 10:45–12:30 | Hall X4

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Fri, 2 May, 08:30–12:30
Chairpersons: Ross A. W. Slater, Vasileios Baousis, Tina Odaka
X4.136
|
EGU25-2605
Jean Iaquinta, Anne Fouilloux, and Benjamin Ragan-Kelley

Integrating High-Performance Computing (HPC) and cloud computing in climate sciences is difficult, due to intricate hardware/software, compatibility, performance and reproducibility issues. Here, we address these challenges in a user-friendly way by leveraging the Conda ecosystem and containers.

Containerization allows to match or exceed native performance on HPC while ensuring bit-for-bit reproducibility for deterministic algorithms and similar processor architectures. This approach simplifies deploying climate models across different platforms; for example, CESM 2.2.2 (Community Earth System Model) provides on various clusters throughputs in simulated years per computational day within +/- 1% of bare-metal performance for simulations spanning thousands of processors.

Exclusively using generic Conda packages for MPI (Message Passing Interface) applications was challenging in HPC. Although OpenMPI included UCX (Universal Communication X) and OFI (Open Fabric Interface), it lacked UCC (Unified Collective Communication) and wasn't optimized by default for high-performance networks like InfiniBand, RoCE (Remote Direct Memory Access over Converged Ethernet) and HPE (Hewlett Packard Enterprise) Slingshot-11, often defaulting to TCP/IP (Transmission Control Protocol/Internet Protocol) or failing. 
 
After updating Conda-Forge’s OpenMPI and MPICH feedstocks, we are adding MVAPICH and ParaStationMPI support to PnetCDF, HDF5, NetCDF-C, NetCDF-Fortran and ESMF (Earth System Modeling Framework) libraries critical for modellers, alongside libFabric and openPMIx (Process Management Interface - Exascale). This incidentally exposed ABI (Application Binary Interface) compatibility issues. Now, MPI toolchains featuring major UCX/OFI/PMIx versions ensure consistent operation across different hosts without affecting numerical results. Using the same Conda environment inside a container, and no hardware-specific optimization, preserves bitwise reproducibility. OMB (Ohio State University Micro-Benchmark) tests for latency, bandwidth and other metrics help confirm if optimal performance can be achieved or not. 

Such developments enable climate scientists to focus on addressing scientific questions rather than sorting out software dependencies and technical problems. One can write code on a laptop then effortlessly scale to cloud or supercomputers, and seamlessly run climate simulations somewhere then continue these wherever compute resources are available without worrying about discontinuities. This also releases expensive HPC resources for production instead of wasting them for training, learning, development or testing which can be performed comfortably elsewhere, without job scheduling constraints, in the very same software environment.

Conda has primarily been developed with a focus on compatibility which limits its suitability in highly performance-sensitive applications where locally optimized builds of specific key components are paramount, typically in climate modeling. Additionally, instead of relying on local engineers to install and maintain host software, Conda users can benefit from the work of thousands of open-source contributors who continuously update and test the entire ecosystem.

This strategy fits the session's theme by providing a framework where cloud resources can be utilized for big data without compromising the performance or rigor of HPC environments. Conda and container technologies ought to change how climate scientists approach software management, focusing on ease of use, scalability and reproducibility, thereby potentially altering practices within the field to improve usage of computational resources and leverage community efforts to remain at the forefront.

How to cite: Iaquinta, J., Fouilloux, A., and Ragan-Kelley, B.: Climate Modeling with Conda and Containers to Improve Computational Resource Usage while Achieving Native Performance and Reproducibility, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-2605, https://doi.org/10.5194/egusphere-egu25-2605, 2025.

X4.137
|
EGU25-8127
|
ECS
Mark Melotto, Rolf Hut, and Bart Schilperoort

The eWaterCycle platform provides hydrologists with a platform that allows them to work with each other's models and data without having to become a computer scientist in the process. The eWaterCycle platform supports existing hydrological models and makes them available for scientists using the BMI model interface as a communication layer. Models run in containers for reproducibility and dependency control. Popular hydrological models are readily available (PCRGLobWB, WFLOW, HBV, etc.). Scientists develop their analyses or experiments in the widely known JupyterHub environment. 

While in theory eWaterCycle can be installed and run on any hardware, in practice most users interact with it on the SURF Research Cloud, a cloud computing infrastructure available to the Dutch academic ecosystem. Until recently upscaling from Cloud to HPC infrastructure for larger model runs required extensive knowledge of the HPC system. Here we will present our work on building a seamless workflow that allows scientists to upscale their cloud based work to the Snellius supercomputer and the Spider grid computer without having to worry about technical issues like mounting points for (large) datasets and container engines.

Our workflow opens up the possibility for more scientists to benefit from HPC and Grid resources while focussing on their domain science. We present the workflow in such a format that it should be easily portable to other hybrid cloud - HPC infrastructures, including the DestinE systems.

How to cite: Melotto, M., Hut, R., and Schilperoort, B.: Seamless Upscaling Research from Cloud to HPC using eWaterCycle, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-8127, https://doi.org/10.5194/egusphere-egu25-8127, 2025.

X4.138
|
EGU25-9432
Claudio Pisa, Marica Antonacci, Vasileios Baousis, Sotirios Aspragkathos, Iasonas Sotiropoulos, and Stamatia Rizou

Europe faces a growing frequency of extreme weather events, from heatwaves and floods to wildfires and earthquakes, increasingly threatening urban environments. Unusually warm winters are becoming progressively common, destabilizing ecosystems and altering traditional weather dynamics. 

Addressing these crucial changes, the CLIMRES project aims to foster a “Leadership for Climate-Resilient Buildings” by identifying and categorizing vulnerabilities within the built environment and assessing their effects within urban systems. This effort integrates diverse data sources, including Copernicus services, IoT networks, and municipal datasets, and considers hazard warnings and weather forecasts. Moreover, a liaison with the Destination Earth initiative enhances the project with the capacity to leverage extreme weather predictions and future climate models. 

CLIMRES aims to deliver vulnerability assessment and impact evaluation methodologies, along with a “hub of measures” inventory for cost-effective building design and materials against climate risks, as well as decision support tools, to aid building owners, policymakers and stakeholders in planning effective interventions and to address vulnerabilities, targeting three levels of decision making at strategic, tactical and operational levels. The project deploys cloud technologies like OpenStack and Kubernetes to host an interoperable platform for vulnerability analysis, data harmonization, and decision-making. Its solutions will be tested and validated on 3 Large Scale Pilots in Spain, Greece, Italy, and Slovenia, addressing hazards such as heatwaves, flooding, fires, and earthquakes. A multi-hazard replication pilot in France will further evaluate the scalability and versatility of these approaches across diverse contexts. 

Insights from these pilots will feed into a replication roadmap and a capacity-building program designed to train future leaders in climate-resilient urban development. By fostering co-creation with local stakeholders and communities, CLIMRES ensures its innovative solutions are practical, cost-effective, and replicable, targeting Technology Readiness Levels (TRL) 6-8. 

CLIMRES aims to bridge innovation with actionable solutions, equipping building owners, policymakers, and communities with the tools needed to enhance urban climate resilience. This presentation highlights the project’s interdisciplinary approach, outputs and technological underpinnings, offering insights into scalable solutions for climate adaptation in urban settings. 

How to cite: Pisa, C., Antonacci, M., Baousis, V., Aspragkathos, S., Sotiropoulos, I., and Rizou, S.: Co-Creating Cloud-Based Tools for Urban Climate-Resilience: The CLIMRES Project, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9432, https://doi.org/10.5194/egusphere-egu25-9432, 2025.

X4.139
|
EGU25-10977
Marica Antonacci, Vasileios Baousis, Claudio Pisa, Stamatia Rizou, and Iasonas Sotiropoulos

The BUILDSPACE project harnesses the transformative potential of cloud computing to evolve urban development and resilience practices. By integrating advanced Earth Observation (EO) data with state-of-the-art satellite and cloud technologies, BUILDSPACE addresses critical urban challenges, including climate adaptation, energy efficiency, and disaster resilience, while contributing to the European Green Deal’s objectives of sustainability and carbon neutrality. 

Central to BUILDSPACE are five innovative services designed to support urban decision-making. At the building scale, the project facilitates the generation and visualization of detailed digital twins through interactive displays, virtual reality (VR), and augmented reality (AR) interfaces. These digital twins enable precise simulations for energy optimization, operational efficiency, and climate impact assessment. At the city scale, BUILDSPACE provides tools to address climate scenarios, such as urban heat islands and flooding, empowering municipalities and urban planners with actionable insights through interactive, map-based platforms. 

The project’s technical foundation lies in a robust, cloud-native architecture built on Kubernetes and OpenStack, combined with a DevOps methodology to streamline both infrastructure services and application deployment. Kubernetes orchestrates containerised workloads, enabling efficient automated deployment, scaling and management of applications, while OpenStack provides a flexible infrastructure for managing compute, storage, and networking resources. Through the DevOps approach, we ensure continuous integration and delivery (CI/CD), fostering rapid development cycles and operational agility. By adopting open-source cloud platforms, the project ensures interoperability, reproducibility and automation across diverse environments, driving consistency and efficiency throughout the lifecycle of both infrastructure and applications. 

The project’s services are being validated across four European cities representing diverse climatic conditions, namely Warsaw, Riga, Piraeus and Ljubljana. These validations focus on two scenarios: construction companies monitoring building processes with advanced digital tools, and municipalities analysing the impacts of climate change on urban infrastructure. 

By advancing from TRL 5-6 to TRL 7-8, BUILDSPACE aims to deliver market-ready solutions that align with the European GNSS and Copernicus initiatives and to synchronise with the advances, concerning Digital Twin technologies and data federation mechanisms, of the Destination Earth initiative, while paving the way for a broader adoption of cloud technologies in EO-based urban resilience applications. 

How to cite: Antonacci, M., Baousis, V., Pisa, C., Rizou, S., and Sotiropoulos, I.: Cloud-Powered Earth Observation Tools for Urban Resilience: The BUILDSPACE Project, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10977, https://doi.org/10.5194/egusphere-egu25-10977, 2025.

X4.140
|
EGU25-13725
Marco Zazzeri and the PARACELSO team

In recent years, technological advances in the use of geospatial data (such as satellite images, anthropogenic and/or environmental raster and vector open data, etc.) for hydrogeological risk assessment, combined with advanced analysis techniques (e.g., machine learning), have become increasingly valuable. These technologies can be utilized by local and national authorities for land planning and emergency management to better understand the dynamics associated with climate change. This understanding can help guide actions aimed at safeguarding not only environmental resources but also socio-economic assets and citizens’ lives.

In pursuit of this goal, a partnership has been established between the Po River Basin District Authority (AdBPo), the Italian Space Agency (ASI), and academic and research institutions such as the University of Bologna (UNIBO), the University of Modena and Reggio Emilia (UNIMORE), the University of Padova (UNIPD), and the Institute of Environmental Geology and Geoengineering of the National Research Council of Italy (CNR-IGAG). The aim is to implement a downstream service for monitoring landscape evolution related to fluvial systems (geomorphological classification), and slope dynamics (including landslides and rock glaciers) and to quantitatively evaluate the exposed assets.

The PARACELSO project (Predictive Analysis, MonitoRing, and mAnagement of Climate change Effects Leveraging Satellite Observations) aims to develop a modular and interoperable cloud-based platform that supports the analysis of natural phenomena (such as fluvial hydrodynamics, landslides, and rock glaciers) using satellite images provided by:

  • DIAS platforms deployed by the Copernicus Programme (e.g., Sentinel 1-2);
  • ASI missions such as CosmoSkyMed, PRISMA, and SAOCOM.

Furthermore, a methodology integrating Earth Observation and geospatial data analysis has been implemented using open-source libraries.

To facilitate this, the MarghERita supercomputer, named in honor of the scientist Margherita Hack, has been made available by the Emilia-Romagna region. It is used both to store the downloaded satellite images and to run the algorithms developed in the project for studying the temporal evolution of river and slope systems. Finally, it enables the sharing and visualization of processed data.

The project has received funding from ASI through the “I4DP_PA (Innovation for Downstream Preparation for Public Administrations)” Call for Ideas.

How to cite: Zazzeri, M. and the PARACELSO team: Cloud-based platform for the management of hydrogeological risks in the Po Basin , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13725, https://doi.org/10.5194/egusphere-egu25-13725, 2025.

X4.141
|
EGU25-16230
Lars Buntemeyer

High-resolution regional climate model datasets, such as those produced within the Coordinated Regional Downscaling Experiment (CORDEX) framework, are critical for understanding climate change impacts at local and regional scales. These datasets, with their high spatial and temporal resolution, provide detailed insights into region-specific climate phenomena, including urban heat islands, mountainous climates, and extreme weather events. However, their accessibility and usability are often constrained by technical challenges such as fragmented data storage, inconsistent formats, and limited interoperability.

To address these barriers, we are developing the Climate Service Database (CSD) - a centralized data warehouse designed to streamline the temporal and spatial aggregation of CORDEX datasets for climate service applications. The CSD ingests raw CORDEX datasets and applies automated extraction, transformation, and loading (ETL) workflows to produce analysis-ready datasets tailored to user needs. By leveraging cloud-based infrastructure and adhering to Climate and Forecast (CF) conventions, the CSD ensures consistent, interoperable data products that are optimized for scalable access and analysis.

A core functionality of the CSD is its ability to aggregate datasets at multiple spatial and temporal scales, ranging from daily extremes to decadal averages, and across diverse spatial resolutions (e.g., countries, administrative regions, or watersheds). This capability enables the generation of climate indicators (e.g., hot summer days, heavy precipitation events) that are directly relevant for local decision-making and impact assessments. By providing data in cloud-optimized, analysis-ready formats (ARCO) and offering Software as a Service (SaaS), the CSD significantly lowers the technical barriers for researchers, businesses, and policymakers seeking to access user-tailored climate service datasets.

By centralizing and optimizing the processing of regional climate model datasets, the CSD fosters collaboration across research institutions, public agencies, and climate-tech startups. It enables users to efficiently access consistent and up-to-date data while eliminating the redundancies of localized data storage and processing. This approach also opens new opportunities for applying AI-driven analytics and machine learning models to CORDEX data, paving the way for innovative climate services and applications.

Through its focus on regional climate model datasets, the CSD exemplifies how modern data infrastructures can enhance the usability of high-resolution climate data, empowering stakeholders to develop robust, data-driven adaptation and mitigation strategies in response to the challenges of climate change.

How to cite: Buntemeyer, L.: Advancing Regional Climate Data Accessibility through a Cloud-native Climate Service Database, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16230, https://doi.org/10.5194/egusphere-egu25-16230, 2025.

X4.142
|
EGU25-18336
Milana Vuckovic and Becky Hemingway

ECMWF has been providing resources on its operational high-performance computing (HPC) and cloud facilities (European Weather Cloud) to researchers and institutions through the Special Projects framework. This framework has been established almost 50 years ago as part of the creation of ECMWF. ECMWF's HPC facility is specifically designed to support both operational time-critical production of global weather forecasts and typical research workflows, therefore through Special Projects, researchers can get access not only to a top high-performance computing and cloud facility and one of the largest meteorological archives in the world, but also full user support.
Special Projects are defined as experiments or investigations of a scientific or technical nature, undertaken by one or more ECMWF Member States, likely to be of interest to general scientific community. The main aim of this initiative is to facilitate collaboration, enabling the development of innovative methodologies and tools for numerical weather prediction, climate and environmental modelling, and other disciplines within Earth System Sciences. All Special Project applications undergo a review process by ECMWF and its Scientific Advisory Committee (SAC), as well as ECMWF Member State's meteorological services and are ranked primarily by their scientific quality.
This poster will describe the Special Projects framework and showcase three recent Special Projects that illustrate collaborative nature of the initiative using ECMWF's HPC and European Weather Cloud facilities, including validating ICON model on ECMWF systems, the development of next-generation European Earth System Model (EC-EARTH4) and mapping the yet uncharted continuum of cyclone dynamics for the Euro Atlantic domain.
Through these examples, the poster will demonstrate how ECMWF Special Projects foster international collaboration, resource sharing, and innovation, enabling advancement in Earth System Science. 

How to cite: Vuckovic, M. and Hemingway, B.: Advancing Earth System Science through collaboration: An overview of ECMWF Special Projects, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18336, https://doi.org/10.5194/egusphere-egu25-18336, 2025.

X4.143
|
EGU25-18285
|
ECS
Justus Magin, Jean-Marc Delouis, Lionel Zawadski, Julien Petiton, Max Jones, and Tina Odaka

Regridding data from diverse sources, such as satellite observations and numerical models, is a critical task in Earth system sciences. Proper interpolation methods are essential to ensure data fidelity when combining or comparing datasets on different grids. This becomes especially relevant in the context of emerging grid systems like Discrete Global Grid Systems (DGGS), specifically HEALPix.

DGGS are spatial reference systems designed to partition the Earth’s surface into a hierarchy of equal-area cells. Unlike traditional latitude-longitude grids, DGGS uses tessellations, such as hexagons, to represent the Earth’s curved surface with minimal distortion. This grid system is particularly suited for handling global-scale geospatial data by providing uniform coverage and resolution, enabling efficient storage, processing, and analysis.

HEALPix (Hierarchical Equal Area isoLatitude Pixelation) is a specific implementation of DGGS widely used in astronomy and Earth sciences. HEALPix divides the sphere into equal-area cells following an iso-latitude structure, making it computationally efficient for operations such as spherical harmonics and multi-resolution analysis. Originally developed for astrophysical applications, it has become increasingly popular in the Earth sciences for representing satellite data, model outputs, and other geospatial datasets in a way that preserves area integrity and facilitates seamless multi-resolution data integration.

By leveraging these grid systems, particularly HEALPix, we can achieve a more accurate and efficient representation of geospatial data.

The Pangeo ecosystem includes an array of powerful regridding tools, each tailored to specific grid types and applications. However, navigating this ecosystem to identify the most suitable tool and workflow can be challenging.

In this presentation, we will show an overview of regridding solutions within Pangeo, highlighting their capabilities and limitations, as well as  their application. We will also demonstrate a practical regridding workflow using model outputs or simulated satellite data such as the Odysea dataset (Aviso+ Altimetry. (n.d.). Simulated Level-2 Odysea Dataset. Retrieved from https://www.aviso.altimetry.fr/en/data/products/value-added-products/simulated-level-2-odysea-dataset.html on January 14, 2025), to the HEALPix grid. This workflow will make use of recent advances in technology to make it reproducible to make it efficient and reproducible, such as virtualizarr for fast metadata access and dask for scalable operations, with the output saved as chunked zarr files for seamless integration with downstream analysis.

How to cite: Magin, J., Delouis, J.-M., Zawadski, L., Petiton, J., Jones, M., and Odaka, T.: Regridding Satellite and Model Data to DGGS (HEALPix) Using the Pangeo Ecosystem, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18285, https://doi.org/10.5194/egusphere-egu25-18285, 2025.

X4.144
|
EGU25-20590
Cédric Pénard, Nathan Amsellem, Boris Gratadoux, Bastien Barthet, Jean Christophe Pere, Johannes Staufer, Laure Chaumat, and Alexia Mondot

Dhemeter is a weather and environmental data aggregator. It is developed using a microservices architecture to handle a wide variety of data from various providers, such as NOAA, ECMWF, Eumetsat, Météo-France, DWD, and Copernicus. The implementation of aggregation, concatenation, and consistency functionalities has been successfully executed for meteorological data. This versatile tool accommodates numerical model data, in-situ observations, remote sensing data, and reanalyses, allowing for online data retrieval from multiple sources.

Key features of the aggregator include:

  • Concatenation of Multiple Data Sources: Users can combine data according to selected categories such as Observations, Forecasts, and Reanalyses.
  • Standardization of Physical Data: This involves spatial and temporal interpolation as well as geographical selections to ensure uniformity.
  • Storage of Resulting Data Structures: The data is stored in a pivot format that facilitates access and distribution of scientific data, specifically in the NetCDF format.

The microservices architecture of the aggregator allows for the extensibility of the offered data catalog, and an API is available for users to make direct queries to chosen data sources.

In the short to medium term, the goal is to enhance the tool further, evolving it into a comprehensive data distribution and aggregation system that centralizes and simplifies access to various types of data, including meteorological, oceanographic, and air quality data.

Dhemeter focuses on ease of use, extensibility, scalability, and customization, offering users capabilities for data fusion and harmonization.

How to cite: Pénard, C., Amsellem, N., Gratadoux, B., Barthet, B., Pere, J. C., Staufer, J., Chaumat, L., and Mondot, A.: Dhemeter: Data Hub for Environmental and METEorological Resources, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-20590, https://doi.org/10.5194/egusphere-egu25-20590, 2025.