ESSI – Earth & Space Science Informatics
Tuesday, 5 May
Cloud computing and high-performance computing (HPC) have become essential infrastructures for processing large-scale Earth Observation (EO) and Earth System modeling data. The convergence of these paradigms—combined with containerization, AI/ML frameworks, and cloud-native storage—is reshaping how we manage, analyze, and share geoscientific information.
Pangeo (pangeo.io) is a global open community developing scalable, interoperable workflows using tools such as Xarray, Dask, Zarr, and Jupyter. Discrete Global Grid Systems (DGGS) offer a complementary paradigm: equal-area, multi-resolution indexing that enables seamless integration across domains and scales. Together, these approaches support FAIR (Findable, Accessible, Interoperable, Reusable) data management and reproducible, transdisciplinary research.
We invite contributions that explore Cloud and HPC workflows for Earth science, including but not limited to:
• Big data platforms, cloud federations, and interoperable infrastructures (IaaS, PaaS, SaaS)
• Cloud-HPC convergence for EO and modeling workloads
• DGGS-based data organization, indexing, and multi-resolution analysis
• Cloud-native AI/ML applications for geoscientific data
• Reproducible workflows and executable notebooks using Pangeo tools
• Cloud storage solutions, data lakes, and FAIR data management
• Sustainable and green computing practices
We welcome case studies, technical developments, and community-driven initiatives that advance open, scalable, and interoperable Earth data science.
The development of digital twins in Earth systems, such as Destination Earth, is revolutionizing our approach to understand and manage our planet’s complex dynamics under a changing climate. These advanced simulations enable us to integrate diverse types and sources of data, providing a comprehensive view of Earth-climate dynamics and human-environment interactions. In detail, digital twins allow to replicate a system behaviour, provide an up-to-date status of ongoing physical processes, support informed decision-making. They enable predictive Earth observation, exploring "what if" scenarios or simulating hazard cascades, and testing various adaptation strategies.
This session will explore the role of digital twins for bridging observations and simulations to applications in impact sectors. There will be a special focus on uncertainty quantification, data assimilation, multi-source data streams, hybrid modelling, and decision support. We are particularly interested in studies that highlight the synergies between digital twin technology and other AI-driven tools, such as predictive analytics and machine learning, in improving operational outcomes. This session aims to foster cross-disciplinary dialogue on how these converging technologies can accelerate resilience to climate-related risks and natural hazards, including in a variety of impact sectors (energy, food, carbon storage, etc…) and will extend to economic, social components and policy considerations. It will act as a forum for researchers and practitioners to share their insights and recent developments in this rapidly evolving field.
This session provides a platform for showcasing state-of-the-art methods and techniques to assess risks associated with hydro-climatic extremes like floods, storms, landslides, and on compound dry hazards such as droughts, heatwaves, and fires. When these events are compounded, overlapping each other in time and spatial coverage, or following one another, their compounded nature generates cascading impacts on water resources, ecosystems, infrastructure, and human systems that cannot be captured by single hazard analyses alone. We aim to exchange knowledge and insights into how machine learning algorithms, data mining techniques, physical models, and the integration of satellite data can significantly enhance predictive capabilities for analyzing the societal risks associated with hydro-climatic extremes and compound hazard events. The session highlights innovative applications and real-world case studies demonstrating how these technologies can be applied for disaster risk reduction, emergency response, and climate adaptation. Through discussions on the latest methodologies and practical applications, the session will facilitate cross-disciplinary collaboration between remote sensing experts, ecologists, climate scientists, AI researchers, hydrologists, and decision makers.
Key Themes:
Processes:
Physical processes involved in hydro-climatic extremes and compound hazards (e.g., droughts-heatwaves-fires), their precondition factors, enabling mechanisms, feedbacks, emergent properties, and synergistic effects. Interaction and impact of such events in the physical system, ecosystems, and human population.
Methods & techniques:
Integration of remote sensing, data mining, and machine learning approaches to enhance the detection, monitoring, and prediction of hydro-climatic extremes and compound events. Combination of physically-based hydrological and climatological models with AI-driven simulations, as well as applications across multiple spatial and temporal scales, from local case studies to regional and global assessments.
Reliability in water research depends on two key aspects: the availability of robust observational data and the rigorous selection and validation of model frameworks. This session highlights the importance of data acquisition, quality control, and curation in supporting reliable methodologies across hydraulic and hydrologic engineering.
In hydraulics, flume experiments provide controlled, high-quality datasets but are resource-intensive and limited in scalability. Numerical modeling offers greater flexibility to simulate diverse flow conditions, yet its accuracy is highly sensitive to parameterization, boundary conditions, and discretization schemes. In hydrology, sparse and uncertain field data further complicate model calibration and validation.
Recent advances in artificial intelligence (AI) and machine learning (ML) allow researchers to analyze large and heterogeneous datasets. However, risks arise when dataset adequacy, representativeness, or validation are overlooked, leading to ambiguous outcomes. These issues intensify when experimental, numerical, and AI-driven approaches are not cross-validated or integrated, weakening robustness and transferability.
This session aims to strengthen understanding of data curation and model selection as critical, though often overlooked, components in solving water resource challenges. Topics of interest include:
1. Strategies for data acquisition, handling, and curation across laboratory, field, numerical, and AI/ML approaches.
2. Best practices in optimization, calibration, and hyper-parameterization to improve model performance.
3. Frameworks for integrating laboratory, field, and computational datasets for consistency and cross-validation.
4. Data curation methods that enhance efficiency, reproducibility, and reliability in modeling.
Through interdisciplinary dialogue, the session seeks to generate methodological insights and practical guidelines that enhance accuracy in data handling and model selection. The overarching goal is to advance high-quality, validated, and context-relevant outcomes that strengthen resilience and reliability in water research.
Machine learning (ML) and artificial intelligence (AI) are transforming the way we study the cryosphere. These data-driven tools are rapidly increasing in popularity and offer potential impact throughout the scientific workflow, from the way we design studies, observe processes, collect data, model phenomena, and analyse systems to the way we construct and test hypotheses. While ML and AI methods applied across the cryosphere may be originally intended to answer a particular cryospheric question, the solutions developed to solve these specific problems may offer generalisable approaches and transferable insights to issues in other domains of the cryosphere. As such, this session invites contributions using ML and AI from all branches of cryospheric science, including snow and avalanches; permafrost; glaciology; ice caps, ice sheets, ice shelves and icebergs; sea ice; and freshwater ice. We also welcome contributions focusing on dataset development, theoretical research, and community-building initiatives. This session intends to provide a forum for cross-cutting discussions and knowledge exchange, fostering interdisciplinary collaboration and ultimately promoting the efficient and effective application of ML and AI in the cryosphere.
Social (bring your own lunch): Tuesday, 12:30. The coordinates in what3words are shams.gangway.edgy.
Remote sensing products have a high potential to contribute to monitoring and modelling of water resources. Nevertheless, their use by water managers is still limited due to lack of quality, resolution, trust, accessibility, or experience.
In this session, we look for new developments that support the use of remote sensing data for water management applications from local to global scales. We welcome research aimed at improving the quality of remote sensing products, such as higher spatial and/or temporal resolution mapping of land use and/or agricultural practices or improved assessments of river discharge, lake and reservoir volumes, groundwater resources, drought monitoring/modelling and its impacts on water-stressed vegetation, as well as on irrigation volumes monitoring and modelling. We are interested in quality assessment of remote sensing products through uncertainty analysis or evaluations using alternative sources of data. We also welcome contributions using a combination of different techniques (e.g., physically based models or artificial intelligence techniques) or an integration of multiple sources of data (remote sensing and in situ) across various imagery types (satellite, airborne, drone).
Finally, we wish to attract presentations on developments of user-friendly platforms (following FAIR principles), providing smooth access to remote sensing data for water applications. We are particularly interested in applications of remote sensing to determine the human-water interactions and the climate change impacts on the whole water cycle (including the inland and coastal links).
Join our tutorial on discovering, sharing, and learning with Earth System Sciences data: 1) How to find high-quality datasets for your data-driven projects, including scientific and governmental sources, 2) Tips for selecting the right (disciplinary) repository for sharing your data - according to your needs and particularly addressing the FAIRness and Openness principles, and 3) How to find and use open online courses and educational materials (OER) to leverage discovered data.
We will demonstrate tips, tricks, and how-tos using the NFDI4Earth services OneStop4All (https://onestop4all.nfdi4earth.de/) and the Knowledge Hub (https://knowledgehub.nfdi4earth.de/).
You are invited to share your experiences, best practices, or favorite repositories with the community, and take away practical skills and knowledge to enhance your research.
AI is a gamechanger in the quest for better understanding Earth data. ML allows training of models for virtually any purpose, and many of them are published on open platforms like HuggingFace and Kaggle. However, in practice it is not easy particularly for non-experts to use such models, due to several blockers: Models typically need highly specific data preprocessing requiring skillful python coding. Model metadata are sparse and not standardized. In particular, they are not machine-readable so human intervention is required. A model's comfort zone is not always delineated clearly, and outside of it model accuracy and reliability can drop drastically, such as below 20%.
Recent work in research and standardization is aiming at overcoming these obstacles in the quest for easy-to-use, zero-coding, reliable ML use on spatio-temporal Earth Data. Based on ongoing research in the EU-funded FAIRgeo project we discuss AI-Cubes as a novel paradigm which embeds ML inference seamlessly into the geo datacube query standard, WCPS. Further, the concept of Model Fencing aims at deriving hints about a model's comfort zone so that the server can automatically decide about model applicability on the region selected and warn the user.
Live demos, several of which can be recapitulated by the audience, serve to illustrate the challenges and solution approaches. Ample time will be reserved for active discussion with the audience.
Agenda (tentative):
- Introduction
- Using AI: A platform provider perspective
- Current challenges of AI on EO
- The AI-Cube approach: Making AI smpler, safer, faster
- Summary & outlook
- Discussion
Europe has embarked on an ambitious journey to build the next generation of digital replicas of our planet. The European Commission’s Destination Earth (DestinE) initiative is at the heart of this effort: a multi-year programme, implemented by ECMWF, ESA and EUMETSAT, that is developing high-precision digital twins of the Earth system to model, monitor and simulate natural phenomena, hazards and the related human activities. DestinE combines cutting-edge Earth observations, advanced Earth system modelling, Artificial Intelligence (AI), and Europe’s most powerful supercomputers to deliver actionable insights on climate adaptation, disaster risk reduction, and sustainable development. Complementing this effort, ESA’s Digital Twin Earth programme, together with EU Horizon Europe projects and national initiatives, are advancing the scientific foundations and Earth observation components that underpin these digital twins.
This session invites contributions that explore the applications of Earth system digital twins, co-designed with stakeholders, ranging from extreme event prediction to long-term climate adaptation, from urban liveability to marine and hydrological systems. Building on the successful Digital Twin sessions at EGU in recent years, this session offers a forum for sharing user perspectives that will help shape Europe’s digital twin ecosystem and its global relevance.
Foundation Models (FMs) are set to revolutionize domains like Earth Observation (EO) and Earth Sciences. Trained on vast unlabeled datasets via self-supervised learning, they can uncover complex patterns and latent information. Once pre-trained, Geospatial FMs can be adapted to diverse tasks with minimal fine-tuning or additional data. As a result, this paradigm shift is set to reshape the entire information value chain, with far-reaching implications for industry, research and development, and the broader scientific community.
This session aims to share the latest research and technological advances and discuss practical solutions for effectively integrating FMs into the Earth Observation and Earth Sciences ecosystems. We encourage interdisciplinary collaboration, and submissions from AI researchers, EO and Earth data scientists and industry experts, as well as from stakeholders from High-Performance Computing (HPC), Big Data, and EO application communities.
The main topics for the session are:
● Latest Advances in AI Foundation Models: FMs can process data from various sensors, including multi- or hyper-spectral, SAR, LiDAR, and more, enabling comprehensive analysis of the Earth's dynamics holistically. Recent progress marks a shift from sensor-specific models toward sensor-aware or sensor-agnostic architectures.
● Benchmarking and Evaluating Foundation Models: Establishing standardised fair evaluation metrics and benchmarks to assess the performance and capabilities of FMs, ensuring reliability and efficiency, moving beyond simplistic or canonical use cases.
● Embedding and Geospatial Semantic Data Mining: FMs enable advanced geospatial semantic mining by leveraging latent space embeddings to uncover meaningful patterns and relationships. This enhances interpretation while reducing the need for large volumes of raw data across time and space.
● Implications of Foundation Models for the Community: Understanding the potential societal, environmental, and economic impacts of FMs, fostering informed decision-making and resource management. Seamless integration with downstream systems such as digital twins, public dashboards, and early warning platforms, including deployment at the edge (e.g. onboard satellites) is essential. Emerging roles of Agentic AI, in synergy with Large Language Models (LLMs) open new pathways for autonomous, context-aware EO applications.
Sitting under a tree, you feel the spark of an idea, and suddenly everything falls into place. The following days and tests confirm: you have made a magnificent discovery — so the classical story of scientific genius goes…
But science as a human activity is error-prone, and might be more adequately described as "trial and error". Handling mistakes and setbacks is therefore a key skill of scientists. Yet, we publish only those parts of our research that did work. That is also because a study may have better chances to be accepted for scientific publication if it confirms an accepted theory or reaches a positive result (publication bias). Conversely, the cases that fail in their test of a new method or idea often end up in a drawer (which is why publication bias is also sometimes called the "file drawer effect"). This is potentially a waste of time and resources within our community, as other scientists may set about testing the same idea or model setup without being aware of previous failed attempts.
Thus, we want to turn the story around, and ask you to share 1) those ideas that seemed magnificent but turned out not to be, and 2) the errors, bugs, and mistakes in your work that made the scientific road bumpy. In the spirit of open science and in an interdisciplinary setting, we want to bring the BUGS out of the drawers and into the spotlight. What ideas were torn down or did not work, and what concepts survived in the ashes or were robust despite errors?
We explicitly solicit Blunders, Unexpected Glitches, and Surprises (BUGS) from modeling and field or lab experiments and from all disciplines of the Geosciences.
In a friendly atmosphere, we will learn from each other’s mistakes, understand the impact of errors and abandoned paths on our work, give each other ideas for shared problems, and generate new insights for our science or scientific practice.
Here are some ideas for contributions that we would love to see:
- Ideas that sounded good at first, but turned out to not work.
- Results that presented themselves as great in the first place but turned out to be caused by a bug or measurement error.
- Errors and slip-ups that resulted in insights.
- Failed experiments and negative results.
- Obstacles and dead ends you found and would like to warn others about.
For inspiration, see last year's collection of BUGS - ranging from clay bricks to atmospheric temperature extremes - at https://meetingorganizer.copernicus.org/EGU25/session/52496.
Your high impact journal demands reproducible research, but your reviewers don't have access to your supercomputer...
You want colleagues in another country to work with the petabytes of data you created, but they cannot access your server easily...
You want your students to run the analysis you did for one region on any other region in the world, but don't want to manage the dependencies on their laptops...
In this short course we will give you hands on experience on how to create, publish and share workflows that are 'reproducible by design'. Using openly published Jupyterbooks, online JupyterHubs, git-pullers, open interfaces and data formats you will build a reproducible workflow in a single short course! Based on a decade of work with the eWaterCycle project for Open and FAIR hydrological modelling, we will teach the best practices in making modelling studies, even when requiring High Performance Computing resources, truly reproducible.
Bring a laptop, but no need to install anything: everything will be online!
Large language models (LLMs) and agentic workflows are rapidly transforming scientific research by enabling new capabilities in literature and data discovery, analysis, coding and insight generation. At the same time, their deployment requires rigorous attention to safety, reliability and trustworthiness in scientific contexts.
This session will highlight both the transformative applications and the critical challenges of using LLMs in science. Key topics include developing specialized guardrails against hallucination and bias; creating robust evaluation frameworks, including uncertainty quantification; ensuring scientific integrity, data governance and reproducibility; and addressing unique scientific risks.
We invite submissions on novel scientific applications of LLMs and agentic workflows, methods that ensure integrity and reproducibility, safety mechanisms (e.g., guardrails, risk mitigation, alignment), responsible AI frameworks (including human-in-the-loop design, fairness, and ethics) and lessons learned from real-world deployments. Our goal is to foster discussion on pathways toward safe, effective and trustworthy use of LLMs for advancing science.
Deep learning is revolutionizing geosciences by enabling advanced pattern recognition and predictive modeling across complex datasets. This session welcomes contributions on applications of deep learning in the full spectrum of earth sciences, submitted abstracts are related but not limited to: -Reservoir characterization, -Remote sensing, -Mineral exploration, -Natural hazard forecasting, -Hydrology, and climate modeling. Emphasis is placed on architectures, data strategies, explainability, and integration with domain knowledge. Oral and Poster presentations are welcome.
Motivation
Although in some communities (e.g., meteorology, climate science) the tradition of software writing has a long history, most scientists are not trained software engineers. For early-stage scientific software projects, which are typically developed within small research groups, there is often little expectation that the code will (1) be used by a larger community, (2) be further developed or extended by others, or (3) be integrated into larger projects. This can lead to an “organic” evolution of code bases that result in challenges related to documentation, maintainability, usability, reusability, and the overall quality of the software and its results.
The wider availability of large computing resources in recent decades, along with the emergence of large datasets and increasingly complex numerical models, has made it more important than ever for scientific software to be well-designed, documented, and maintainable. However, (1) established practices in scientific programming, (2) pressures to produce high-quality results efficiently, and (3) rapidly growing user and developer communities, can make it challenging for scientific software projects to
- follow a common set of standards and a style,
- are fully documented,
- are user-friendly, and
- can be maintained, easily extended or reused.
Session content and objectives
We invite developers or users of software projects to prepare presentations about the challenges and successes in the following topics
- Good practices for developing scientific software
- Modularization
- Documentation
- Linting
- Version control
- Open source and open development
- Automatization of quality checks and unit testing
- Planning new projects
- User requirements and the user-turned-developer problem
- Painless and energy-efficient programming solutions across computing architectures
- Modularization and reliability vs performance and multiplatform capacity
- Large-dataset compression and storage workflows
These presentations will show how different projects across geoscientific fields tackle these problems. We can discuss new strategies for bettering scientific software development and raising awareness within the scientific community that robust and well-structured software development enables meaningful and reproducible results, supports researchers —especially doctoral and post-doctoral students— in their work, and accelerates advances in data- and modelling-driven science.
Please decide on your access
Please use the buttons below to download the supplementary material or to visit the external website where the presentation is linked. Regarding the external link, please note that Copernicus Meetings cannot accept any liability for the content and the website you will visit.
Forward to session asset
You are going to open an external link to the asset as indicated by the session. Copernicus Meetings cannot accept any liability for the content and the website you will visit.
We are sorry, but presentations are only available for conference attendees. Please register for the conference first. Thank you.
You are offline
You have lost your Internet connection. You are not able to continue browsing through the page currently loaded from the Copernicus Office online system. Please check your connectivity or try again later.
You are offline
You selected an external link that requires an Internet connection. Please check your connectivity or try again later.
You have already stored your personal programme. Please decide:
the present selections with my stored personal programmemy stored personal programme with the present selections
Please decide on your access
Please use the buttons below to download the supplementary material or to visit the external website where the presentation is linked. Regarding the external link, please note that Copernicus Meetings cannot accept any liability for the content and the website you will visit.
Forward to session asset
You are going to open an external link to the asset as indicated by the session. Copernicus Meetings cannot accept any liability for the content and the website you will visit.
We are sorry, but presentations are only available for conference attendees. Please register for the conference first. Thank you.