ESSI3.6 | Collaborative Science Through Free and Open Source Software Tools and Frameworks in Earth sciences (Geo, Weather, Climate)
EDI PICO
Collaborative Science Through Free and Open Source Software Tools and Frameworks in Earth sciences (Geo, Weather, Climate)
Co-sponsored by AGU
Convener: George P. Petropoulos | Co-conveners: Peter Löwe, Paul Kucera, Kaylin Bugbee, Ionut Cosmin Sandric, Christopher KadowECSECS
PICO
| Thu, 27 Apr, 14:00–18:00 (CEST)
 
PICO spot 2
Thu, 14:00
In recent decades, the use of geoinformation technology has become increasingly important in understanding the Earth's environment. This session focuses on modern open-source software tools, including those built on top of commercial GIS solutions, developed to facilitate the analysis of mainly geospatial data in various branches of geosciences. Earth science research has become more collaborative with shared code and platforms, and this work is supported by Free and Open Source Software (FOSS) and shared virtual research infrastructures utilising cloud and high-performance computing. Contributions will showcase practical solutions and applications based on FOSS, cloud-based architecture, and high-performance computing to support information sharing, scientific collaboration, and large-scale data analytics. Additionally, the session will address the challenges of comprehensive evaluations of Earth Systems Science Prediction (ESSP) systems, such as numerical weather prediction, hydrologic prediction, and climate prediction and projection, which require large storage volumes and meaningful integration of observational data. Innovative methods in open frameworks and platforms will be discussed to enable meaningful and informative model evaluations and comparisons for many large Earth science applications from weather to climate to geo in the scope of Open Science 2.0.

PICO: Thu, 27 Apr | PICO spot 2

Chairpersons: George P. Petropoulos, Peter Löwe, Paul Kucera
14:00–14:10
|
PICO2.1
|
EGU23-8738
|
solicited
|
On-site presentation
Mathieu Gravey

The Open Earth Engine Toolbox (OEET) is an innovative and highly effective suite of tools that makes it easier than ever to work with Google Earth Engine. Comprised of two main components - the Open Earth Engine Library (OEEL) and the Open Earth Engine Extension (OEEex) - the OEET is a true game-changer for anyone working in the field of geospatial analysis and data processing.

The OEEL is a set of JavaScript code libraries that provide a wide range of functions and capabilities for working with Earth Engine. From advanced filtering techniques such as the Savitzky-Golay and Otsu algorithms, to powerful visualization tools like north arrows, map scales, mapshots… the OEEL has everything you need to get the most out of Earth Engine. And with a convenient Python wrapper, it's easy to integrate the OEEL into your existing workflow, scripts or notebooks.

But the OEEex is where the OEET really shines. This Chrome extension is designed you to work in tandem with the OEEL, unlocking a host of additional features and capabilities. One of the standout features of the OEEex is its ability to all run tasks in a single step. This is particularly useful for those working with large datasets or for those who need to perform the same tasks repeatedly. With the OEEex, you can simply set up your tasks and then let the extension handle the rest, saving you time and effort of clicking on each. The OEEex also offers a range of customization options for the interface. These include the ability to switch to a dark mode, which can be easier on the eyes during long work sessions, as well as the ability to adjust font sizes to suit your personal preference. It allows to utilize Plotly within Earth Engine to get beautiful figure, and much more.

Overall, the OEET is an essential tool for anyone looking to get the most out of Google Earth Engine. Its powerful JavaScript libraries, convenient Python wrapper, and feature-rich Chrome extension make it the go-to choice for geospatial analysis and data processing.

How to cite: Gravey, M.: Open Earth Engine Toolbox, code goodies and extension for Google Earth Engine, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8738, https://doi.org/10.5194/egusphere-egu23-8738, 2023.

14:10–14:12
|
PICO2.2
|
EGU23-7760
|
On-site presentation
James Lea, Robert Fitt, Stephen Brough, Georgia Carr, Jonathan Dick, Natasha Jones, Eli Saetnan, and Richard Webster

Investigation of local to global scale environmental change is frequently underpinned by data from climate reanalysis products, yet access to these can be challenging for both new and established researchers. The practicalities of working with reanalysis data often includes handling large data files that can place limit users on the scale of analysis they can undertake; and working with specialist data formats (e.g. NetCDF, GRIB) that can pose significant barriers to entry for those who may be unfamiliar with them. Together, these factors are limiting the uptake and application of climate reanalysis data within both research and teaching of environmental science.

Here we present the Google Earth Engine Climate Tool (GEEClimT), providing an intuitive “point and click” graphical user interface (GUI) for easy extraction of data from 17 climate reanalysis data products relating to atmospheric and oceanic variables (including, but not limited to: ERA5; ERA5-Land; NCEP/NCAR; MERRA; and HYCOM). The GUI is built within the Google Earth Engine geospatial cloud computing platform, meaning users only require an internet connection to rapidly obtain both point data and area averages for user defined regions of interest. To ensure a wide range of usability for researchers, students and instructors, both the GUI and its documentation have been co-created with those who may use reanalysis data for research, teaching, and project purposes. The tool has also been designed with flexibility in mind, allowing it to be easily updated as new datasets become available within the Google Earth Engine data catalogue.

GEEClimT is shown to allow users with little or no previous experience of working with climate reanalysis data or coding to obtain temporally comprehensive data for their regions and time periods of interest. Case studies demonstrating the application of the tool to different environmental and ecological settings are presented, showcasing its potentially wide applicability to both research and teaching across environmental science.

How to cite: Lea, J., Fitt, R., Brough, S., Carr, G., Dick, J., Jones, N., Saetnan, E., and Webster, R.: Google Earth Engine Climate Tool (GEEClimT): Enabling rapid, easy access to global climate reanalysis data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7760, https://doi.org/10.5194/egusphere-egu23-7760, 2023.

14:12–14:14
|
PICO2.3
|
EGU23-15225
|
On-site presentation
Gunnar Brandt, Alicja Balfanz, Norman Fomferra, Tejas Morbagal Harish, Miguel Mahecha, Guido Kraemer, David Montero, Stephan Meißl, Stefan Achtsnit, Josefine Umlauft, Anja Neumann, Alex Horton, Martin Ewart, Fabian Gans, and Anca Anghelea

The Deep Earth System Data Lab (DeepESDL, https://earthsystemdatalab.net) provides an AI-ready, collaborative environment enabling researchers to understand the complex dynamics of the Earth System using numerous datasets and multi-variate, empirical approaches. The solution builds on work done in previous projects funded by the European Space Agency (CAB-LAB and ESDL), which established the technical foundations and created measurable value for the scientific community, e.g., Mahecha et al. (2020, https://doi.org/10.5194/esd-11-201-2020) or Flach et al. (2018, https://doi.org/10.5194/bg-15-6067-2018 ). DeepESDL relies heavily on the well-established open-source technology stacks for data science in Python, thus ensuring usability and compatibility.

The core of the DeepESDL is represented by the provision of programmatic access to various data sources in analysis-ready form, organised in data cubes combined with adequate computational resources and capabilities to allow researchers to immediately focus on efficient analysis and of multi-variate and high-dimensional data through empirical methods or AI approaches. 

To ensure proper documentation and discoverability, DeepESDL is building an informative catalogue to find all available data and to find the required metainformation describing them. This includes not only standard information, e.g., regarding spatial and temporal coverage, versioning, but also on specific transformation methods applied during data cube generation.

The system design has openness, collaboration, and dissemination as key guiding principles. As science teams need proper tooling support to efficiently work together in this virtual environment, one of the key elements of the architecture is represented by the DeepESDL Hub, providing teams of scientific users with the means for collaboration and exchange of versioned results, source codes, models, execution parameters, and other artifacts and outcomes of their activities in a simple, safe and reliable way. The tools are complemented by an integrated, state-of-the-art application for the visualisation of all data in the virtual laboratory including input data, intermediate results, as well as the final products.

Furthermore, the DeepESDL supports the implementation and execution of Machine Learning workflows on Analysis Ready Data Cubes in a reproducible and FAIR way, allowing sharing and versioning of all ML artifacts like code, data, models, execution parameters, metrics, and results as well as tracking each step in the ML workflows (supported by integration with Open-Source tools like TensorBoard or Mlflow) for an experiment so that others can reproduce them and contribute.

Finally, dissemination is essential for the Open Science spirit of the DeepESDL. Two applications, xcube Viewer and 4D viewer, offer comprehensive user interfaces for interactive exploration of multi-variate data cubes. Both use the same RESTful data service API provided by xcube Server. The latter also provides OGC interfaces, so that other OGC-compliant applications, such as QGIS3, are able to visualise analysis-ready data cubes generated within DeepESDL.

To foster collaboration, additional features such as publishing individual Jupyter Notebooks as storytelling documents or even books using Jupyter Books or the Executable Book Project are being explored, together with concepts such as storytelling and DeepESDL User Project Dashboards which may also link to the viewers and Notebooks.

How to cite: Brandt, G., Balfanz, A., Fomferra, N., Morbagal Harish, T., Mahecha, M., Kraemer, G., Montero, D., Meißl, S., Achtsnit, S., Umlauft, J., Neumann, A., Horton, A., Ewart, M., Gans, F., and Anghelea, A.: DeepESDL – an open platform for research and collaboration in Earth Sciences, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15225, https://doi.org/10.5194/egusphere-egu23-15225, 2023.

14:14–14:16
|
PICO2.4
|
EGU23-16145
|
ECS
|
On-site presentation
|
Verena Bessenbacher, Lukas Gudmundsson, Martin Hirschi, and Sonia I. Seneviratne

The volume of Earth system observations from space and ground has massively grown in recent decades. Despite this increasing abundance, multivariate or multi-source analyses at the interface of Atmosphere and Land are however still hampered by the sparsity of ground observations and a large number of missing values in satellite observations. In particular, there are many instances where some variables are observed at a particular time and location, while others are not available, thereby hindering robust analysis. Gap-filling is hence necessary but often done implicitly or for only single variables which can obscure physical dependencies. Here we use CLIMFILL (CLIMate data gap-FILL), a recently developed multivariate gap-filling procedure to bridge this gap. CLIMFILL combines state-of-the-art spatial interpolation with a statistical gap-filling method designed to account for the dependence across multiple gappy variables. CLIMFILL is applied to a set of remotely sensed and in-situ observations over land that are central to observing land-atmosphere feedbacks and extreme events. The resulting gridded time series spans the years 1995-2020 globally on a 0.25-degree resolution with monthly gap-free maps of nine variables including ESA CCI surface layer soil moisture, MODIS land surface temperature, diurnal temperature range, GPM precipitation, GRACE terrestrial water storage, ESA CCI burned area, ESA CCI snow cover fraction as well as two-meter temperature and precipitation from in-situ observations. Internal verification shows that this dataset can recover time series of anomalies better than state-of-the-art interpolation methods. It shows high correlations with respective variables of ERA5-Land and can help elongate and gap-fill ESA CCI surface layer soil moisture timelines for comparison with ISMN station observations. We showcase key features of the newly developed data product using three major fire seasons in California, Australia, and Europe. Their their accompanying droughts and heatwaves are well represented and can serve as a gap-free completion of an otherwise fragmented observational picture of these events. The dataset will be made freely available and can serve as a step towards the fusion of multi-source observations to create a Digital Twin of the Earth.

How to cite: Bessenbacher, V., Gudmundsson, L., Hirschi, M., and Seneviratne, S. I.: Gap-filled multivariate observations of global land-climate interactions, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16145, https://doi.org/10.5194/egusphere-egu23-16145, 2023.

14:16–14:18
|
PICO2.5
|
EGU23-461
|
ECS
|
On-site presentation
|
Grethell Castillo Reyes, René Estrella, Karen Gabriels, Jos Van Orshoven, Floris Abrams, and Dirk Roose

Afforestation of certain areas of a river catchment can reduce the outflow of sediment from that catchment. We developed algorithms and software, called CAMF, to minimize the sediment outflow, based on a) a model for local sediment production, b) some parameters such as retention capacity and saturation threshold, c) a raster geo-database containing elevation data and land use. The software can also be adapted to model the effect of other actions than afforestation. We implemented both Single Flow Direction (SFD) and Multiple Flow Direction (MFD) methods to simulate flow transport. We analyze the differences between the two approaches. With the use of MFD methods the spatial interaction increases. As a consequence, the flow simulation with CAMF-MFD, executed in each iteration of the minimization procedure, has a substantially higher computational cost. The total execution time of CAMF can be prohibitively expensive for large geo-databases, since in each iteration only the cell(s) with the maximum sediment outflow reduction are selected.

In each iteration of the minimization procedure, a sediment flow simulation is performed for each candidate cell. Since these simulations are independent of each other, we parallelized CAMF for multi-core processors using Open Multi-Processing (OpenMP) directives. Each thread executes the simulations for a subset of the candidate cells. To distribute the simulations over threads, dynamic scheduling is used to handle the imbalance due to the varying execution time of the simulations for each candidate cell. We also adapted the algorithm in two ways to accelerate the execution. First, in each iteration several cells, that produce nearly the same sediment yield reduction at the outlet, are selected. A threshold T determines the number of selected cells. Second, a complete ranking of all cells, with respect to their potential for sediment yield reduction by afforestation, is only computed every K iterations, while in intermediate iterations only N cells are ranked, namely those at the top of the previous complete ranking. The values for T, K and N substantially reduce the computational cost, while the solution quality is typically only slightly lower.

We evaluated the performance of the accelerated variant for minimizing sediment outflow by afforestation using a raster geo-database of the Tabacay catchment (Ecuador), with a cell size of 30m × 30m. From a total of 73 471 non-null cells, 27 246 are candidate cells for afforestation.

A high speedup is obtained for up to 28 cores (≈22), leading to a substantial reduction of the execution time. The accelerated variant produces nearly the same yield reduction at the outlet and selects almost the same cells than the original CAMF-MFD. The difference between the set of cells selected by both algorithms is measured by the relative spatial coincidence RSC. Results show that in all considered cases RSC > 99%.

How to cite: Castillo Reyes, G., Estrella, R., Gabriels, K., Van Orshoven, J., Abrams, F., and Roose, D.: Reducing the computational cost of an iterative method for sediment yield minimization by afforestation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-461, https://doi.org/10.5194/egusphere-egu23-461, 2023.

14:18–14:20
|
PICO2.6
|
EGU23-1611
|
On-site presentation
Agnieszka Zwirowicz-Rutkowska and Paweł Soczewski

The term ‘open data’ is used for data that anyone is free to access, use, modify and share. Two dimensions of data openness are recognized: data must be both legally and technically open. In the area of open spatial data, considered as public data available on the web and the component of the geospatial infrastructures, from the technical point of view providing access to data according to open data principles could be implemented in many different ways, including services, geoportals or bulk access. As the Web of data evolves the spatial data publication issue is essential, but also challenging when considering Web ecosystem. At the same time, the recast Directive (EU) 2019/1024 on open data and the re-use of public sector information recommends, and in case high-value datasets to publish data via an application programming interface (API) so as to facilitate the development of internet, mobile and cloud applications based on such data.  The latest best practices OGC, W3C and INSPIRE recommend standards from the OGC APIs group as a modern way of sharing spatial data via the API interface.

The first objective of the presentation is to demonstrate the access point to the spatial data of the Polish Environmental Monitoring and National Pollutant Release and Transfer Register implementing the OGC API - Features standard. The second objective is to present the results of the assessment of the environmental geoportal openness and to discuss the concept of ‘open by design’ in the area of spatial data infrastructures development. The presented study adds to comparative analysis of spatial data openness in different countries and also communities dealing with information on the state of the environment, as well as to the experience of geospatial infrastructure development.

How to cite: Zwirowicz-Rutkowska, A. and Soczewski, P.: The Use of the OGC API Standards for Developing Open Environmental Data in Poland, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1611, https://doi.org/10.5194/egusphere-egu23-1611, 2023.

14:20–14:22
|
PICO2.7
|
EGU23-1882
|
ECS
|
Virtual presentation
|
Ayberk Uyanik

In hydrocarbon exploration, evaluation of a prospect’s success ratio relies on assessment of each petroleum system element and combination of it into a single risk factor. This makes estimation of chance of success values crucial to reduce the risks of exploration and to make robust investments for a particular region. To standardise this process, a couple of methods, both for global and basin-scale use, have been proposed in the last 20 years. All of them are table-based methods suggesting explorationists to pick the correct risk values according to the geological conditions they are encountering. However, examining the tables each time can be quite time-consuming. In addition, it can also cause subjective or biased picking of success values, resulting with under or overestimation of risk factors. To prevent repetition and miscalculations, GeoCos offers a web-based application by turning all table-based methods into interactive selection schemes. Users can choose geological conditions defined in table based methods and display the results. In addition, predictions of three table-based methods can be compared in spider or bar charts. Generated figures can be downloaded to implement them for publishment or presentations. There are direct links for the papers of the published methods as well.

Development phase is consisted of two processes as front-end and back-end. Front-end development is based on HTML-CSS-Vanilla Javascript-Plotly.JS while for the back-end Node.JS completes the server side. Open source nature of this practical tool makes it easy to contribute for further developments. Source codes are available at the GitHub repository; https://github.com/Ayberk-Uyanik/GeoCos-v2.0

How to cite: Uyanik, A.: GeoCos v2.0: An open source web application for calculating Chance of Success values of exploration wells, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1882, https://doi.org/10.5194/egusphere-egu23-1882, 2023.

14:22–14:24
|
PICO2.8
|
EGU23-4038
|
Virtual presentation
|
Markus Müller, Holger Metzler, Verónika Ceballos Núñez, Kostiantyn Viatkin, Thomas Lotze, Jon Wells, Yu Zhou, Cuijuan Liao, Aneesh Chandel, Feng Tao, Yuanyuan Huang, Alison Bennett, Chenyu Bian, Lifen Jiang, Song Wang, Chengcheng Gang, Carlos Sierra, and Yiqi Luo

How to formulate and run an element cycling model within a day?
How to compare many models with respect to many different diagnostics reliably? 
How to allow models to be formulated in different ways?
How to make runnable models transparent without implementation details obscuring the scientific content?

bgc_md2 is an open source python library, available on github and binder. 
It provides an extendable set of datatypes that capture the essential properties compartmental models have in common and enables the formulation of a model with a few lines of regular python code. The structure of the model is captured in symbolic math (using sympy) and can be checked during the creation of the model e.g. by drawing a carbon flow diagram or printing the flux equations using the same mathematical symbols used in the publication describing the model. 
This can be done long before a complete parameter set for the model is added and the model can be run e.g. for a benchmark.
The computation of diagnostic variables both symbolic or numeric is based on the common building blocks which avoids the effort, obscurity and possible inconsistency resulting from a model specific implementation. The difference in available data for different models is addressed by computational graphs.
Instead of an inflexible schema for a relational database records can have different entries reflecting the available data.
Using the computability graphs the comparable is extended to the computable data. This allows for instance comparing a model described by a collection of fluxes with one described by matrices.
bgc_md2 is an extendable library that provides complex and well tested tools for model comparison but does not force the user into a rigid framework.
Rather than full automation it aims at flexibility of use within the python universe and can be used interactively in a jupyter notebook as well as in parallel computations on a supercomputer for global data assimilation as we do in a current model inter comparison.
Example jupyter notebooks can be explored interactively on binder without installation. 

How to cite: Müller, M., Metzler, H., Ceballos Núñez, V., Viatkin, K., Lotze, T., Wells, J., Zhou, Y., Liao, C., Chandel, A., Tao, F., Huang, Y., Bennett, A., Bian, C., Jiang, L., Wang, S., Gang, C., Sierra, C., and Luo, Y.: Reproducible Open Source Carbon Cycle Models    Biogeochemical Model Database  bgc_md2, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4038, https://doi.org/10.5194/egusphere-egu23-4038, 2023.

14:24–14:26
|
PICO2.9
|
EGU23-4453
|
ECS
|
On-site presentation
|
Emmanuel Nyenah, Robert Reinecke, and Petra Döll

Global hydrological models are used for understanding, monitoring, and forecasting the global freshwater system. Their outputs provide crucial water-related information for various audiences, such as scientists and policymakers. WaterGAP is such a global hydrological model, and it has been utilized extensively to assess water scarcity for humans and ecologically relevant streamflow characteristics, considering the impacts of human water use and man-made reservoirs as well as of climate change.

The WaterGAP research software has been developed and modified by researchers with diverse programming backgrounds for over 30 years. During this time, there has been no clear-cut protocol for software development and no defined software architecture; hence the current state of the software is a collection of over a thousand lines of code with little  modularity and documentation. As a result, it is challenging for new model developers to understand the current software and improve or extend the model algorithm. Also, it is almost impossible to make the software available to other researchers (e.g., For the reproduction of research results).

Here we present ReWaterGAP, an ongoing project to redevelop WaterGAP into a sustainable research software (SRS). We define SRS as software that (1) is maintainable, (2) is extensible, (3) is flexible (adapts to user requirements), (4) has a defined software architecture, (5) has a comprehensive in-code and external documentation, and (6) is accessible (the software is licensed as Open Source with a DOI (digital object identifier) for proper attribution). The goal is to completely rewrite the software WaterGAP from scratch with a modular structure using a modern programming language and state-of-the-art software architecture, and to provide extensive documentation so that the resulting software fulfills the requirements of a SRS while maintaining good computational performance.

In our presentation, we provide insights into our ongoing reprogramming, outline milestones, and provide an overview of applied best practices from the computer science community (such as internal and external code review, test-driven development, and agile development methods). We plan to share the software development lessons we have learned along the way with the scientific community to help them improve their software.

How to cite: Nyenah, E., Reinecke, R., and Döll, P.: Towards a sustainable utilization of the global hydrological research software WaterGAP, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4453, https://doi.org/10.5194/egusphere-egu23-4453, 2023.

14:26–14:28
|
PICO2.10
|
EGU23-4682
|
On-site presentation
Mohan Ramamurthy, Julien Chastang, and Ana Espinoza

Unidata has developed and deployed data infrastructure and data-proximate scientific workflows and software tools using cloud computing technologies for accessing, analyzing, and visualizing geoscience data. These resources are provided to educators and researchers through the Unidata Science Gateway (https://science-gateway.unidata.ucar.edu) and deployed on the U. S. National Science Foundation funded Jetstream/Jetstream2 cloud computing facility. During the SARS-CoV-2/COVID-19 pandemic, the Unidata Science Gateway has been used by many universities to teach data-centric atmospheric science courses and conduct several software training workshops to advance skills in data science.

The COVID-19 pandemic led to the closure of university campuses with little advance notice. Educators at institutions of higher learning had to urgently transition from in-person teaching to online classrooms. While such a sudden change was disruptive for education, it also presented an opportunity to experiment with instructional technologies that have been emerging for the last few years. Web-based computational notebooks, with their mixture of explanatory text, equations, diagrams and interactive code are an effective tool for online learning. Their use is prevalent in many disciplines including the geosciences. Multi-user computational notebook servers (e.g., Jupyter Notebooks) enable specialists to deploy pre-configured scientific computing environments for the benefit of learners and researchers. The use of such tools and environments removes barriers for learners who otherwise have to download and install complex software tools that can be time consuming to configure, simplifying workflows and reducing time to analysis and results. It also provides a consistent computing environment for everyone, lowering the barrier to access to data and tools. These servers can be provisioned with computational resources not found in a desktop computing setting and leverage cloud computing environments and high-speed networks..

The Unidata Science Gateway hosts more than a Terabyte of real-time weather data each day from nearly 30 different data streams. In addition, many analysis and visualization tools are made available via the Science Gateway and they are linked to the aforementioned real-time data.

Since spring 2020, when the Covid pandemic led to the closure of universities across the world, Unidata has assisted many earth science departments with computational notebook environments for their classes and labs. As of now, we have worked with educators at more than 18 universities to tailor these resources for their teaching and learning objectives. We ensured the technology was correctly provisioned with appropriate computational resources and collaborated to have teaching material immediately available for students. There were many successful examples of online learning experiences.

In this presentation, we describe the details of the Unidata Science Gateway resources and discuss how those resources enabled Unidata to support universities during the COVID-19 lockdown. We will also discuss how Unidata is re-imagining the role of its Science Gateway as a community hub, where university faculty are not only users of the gateway services but also content creators as well as contributors to it and share their products and resources.

How to cite: Ramamurthy, M., Chastang, J., and Espinoza, A.: Unidata Science Gateway: A research infrastructure to advance research and education in the Earth System Sciences, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4682, https://doi.org/10.5194/egusphere-egu23-4682, 2023.

14:28–14:30
|
PICO2.11
|
EGU23-7286
|
On-site presentation
Daniel Caviedes-Voullième, Jörg Benke, Ghazal Tashakor, Ilya Zhukov, and Stefan Poll

The compartmentalised (modular) design often found in multiphysics Earth system models allows for progressive offloading of compute-intensive kernels to accelerators. The nature of this process implies that some model components will run on accelerators, while other components will continue to run on CPUs, leading to the use of heterogeneous HPC architectures. Furthermore, different hardware architectures (e.g. CPUs, GPUs, quantum, neuromorphic)  within an HPC system can be grouped into modules, each tailored to the requirements of a particular class of algorithms and software, and interconnected with other modules via a shared network. Some of these modules may be focused on energy-efficient scalability, whereas others may be disruptive and experimental. Such a conglomerate of different hardware modules, where each module can work stand-alone or in combination with other modules, leads to the idea of modular supercomputer architecture (MSA). The first exascale system in Europe (JUPITER) is expected to be modular, following on from the experience of the JUWELS system. This new paradigm poses questions on how performance and scalability of models change from homogeneous, to heterogeneous to modular systems.

 

The Terrestrial Systems Modelling Platform (TSMP) is a scale-consistent, highly modular, massively parallel, fully integrated soil-vegetation-atmosphere modelling system coupling an atmospheric model (COSMO), a land surface model (CLM), and a hydrological model (ParFlow), linked together by means of the OASIS3-MCT library. Each of these submodels can be considered as a module, with different domain sizes, computational loads and scalability. This implies that optimal configurations for solving a given problem require understanding many levels of non-trivial load balancing. It is currently possible to offload ParFlow to GPUs, while keeping COSMO and CLM on CPUs. This enables both heterogeneous and modular configuration, and thus prompts the need to re-evaluate load distribution and scalability to find new optimal configurations. In a previous study, preliminary results on heterogeneous configurations were presented (https://doi.org/10.5194/egusphere-egu22-10006).


In this contribution, we extend our study and present a comparative study of performance and scaling for homogeneous, heterogeneous, and modular TSMP jobs. We study strong and weak scaling, for different problem sizes, and evaluate parallel efficiency on all three configurations in the JUWELS supercomputer. We further explore traces of selected cases, to identify changes in behaviour under the different configurations, such as emergent MPI communication bottlenecks and root causes of the load balancing issues.  

How to cite: Caviedes-Voullième, D., Benke, J., Tashakor, G., Zhukov, I., and Poll, S.: Comparative performance study of TSMP under homogeneous, heterogeneous and modular configurations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7286, https://doi.org/10.5194/egusphere-egu23-7286, 2023.

14:30–14:32
|
PICO2.12
|
EGU23-7461
|
ECS
|
On-site presentation
Manuel Schlund, Birgit Hassler, Axel Lauer, Bouwe Andela, Lisa Bock, Patrick Jöckel, Rémi Kazeroni, Saskia Loosveldt Tomas, Brian Medeiros, Valeriu Predoi, Stéphane Sénési, Jérôme Servonnat, Tobias Stacke, Javier Vegas-Regidor, Klaus Zimmermann, and Veronika Eyring

Projections from Earth system models (ESMs) are essential to allow for targeted mitigation and adaptation strategies for climate change. ESMs are state-of-the-art numerical climate models used to simulate the vastly complex Earth system including physical, chemical, and biological processes in the atmosphere, ocean, and on land. Progress in climate science and an increase in available computing resources over the last decades has led to a massive increase in the complexity of ESMs and the amount of data and insight they provide. For this reason, innovative tools for a frequent and comprehensive model evaluation are required more than ever. One of these tools is the Earth System Model Evaluation Tool (ESMValTool), an open-source community diagnostic and performance metrics tool.


Originally designed to assess output from ESMs participating in the Coupled Model Intercomparison Project (CMIP), ESMValTool expects input data to be formatted according to the CMOR (Climate Model Output Rewriter) standard. While this CMORization of model output is a quasi-standard for large model intercomparison projects like CMIP, this complicates the application of ESMValTool to non-CMOR-compliant data, like native climate model output (i.e., operational output produced by running the climate model through the standard workflow of the corresponding modeling institute). Recently, ESMValCore, the framework underpinning ESMValTool, has been extended to enable reading and processing native climate model output. This is implemented via a CMOR-like reformatting of the input data during runtime. For models using unstructured grids, data can optionally be regridded to a regular latitude-longitude grid to facilitate comparisons with other data sets. The new features are described in more detail in Schlund et al., 2022 (https://doi.org/10.5194/gmd-2022-205) and in the software documentation available at https://docs.esmvaltool.org/en/latest/input.html#datasets-in-native-format.


This extension opens up the large collection of diagnostics provided by ESMValTool for the five currently supported ESMs CESM2, EC-Earth3, EMAC, ICON, and IPSL-CM6. Applications include assessing the models’ performance against observations, reanalyses, or other simulations; the evaluation of new model setups against predecessor versions; the CMORization of native model data for contributions to model intercomparison projects; and monitoring of running climate model simulations. Support for other climate models can be easily added. ESMValTool is an open-source community-developed tool and contributions from other groups are very welcome.

How to cite: Schlund, M., Hassler, B., Lauer, A., Andela, B., Bock, L., Jöckel, P., Kazeroni, R., Loosveldt Tomas, S., Medeiros, B., Predoi, V., Sénési, S., Servonnat, J., Stacke, T., Vegas-Regidor, J., Zimmermann, K., and Eyring, V.: Evaluation of Native Earth System Model Output with ESMValTool, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7461, https://doi.org/10.5194/egusphere-egu23-7461, 2023.

14:32–14:34
|
PICO2.13
|
EGU23-9782
|
On-site presentation
Bouwe Andela, Peter Kalverla, Remi Kazeroni, Saskia Loosveldt Tomas, Valeriu Predoi, Manuel Schlund, Stef Smeets, and Klaus Zimmermann

The Earth System Grid Federation (ESGF) offers a wealth of climate data that can be used to do interesting research. For example, the latest edition of the Coupled Model Intercomparison Project (CMIP6) output features 20 petabytes of data. However, the heterogeneity of the data can make it difficult to find and work with. Here, we present new features of ESMValCore, a Python package designed to work with large climate datasets available from ESGF and beyond. ESMValCore now provides a Python interface that makes it easy to discover what data is available on ESGF and locally, download it if necessary, and make it analysis-ready. The analysis-ready data can then be used as input to the ESMValCore preprocessor functions, a collection of functions that can be used to perform commonly used analysis steps such as regridding and statistics. When searching for data on ESGF as well as when loading the NetCDF files, the software intelligently corrects small issues in the metadata that otherwise make working with this data a time-consuming, manual effort. Data and metadata issues are fixed in memory for fast performance. The search and download features are user-friendly and will automatically use a different server if one of the ESGF servers is unavailable for some reason. Several Jupyter notebooks demonstrating these new features are available at https://github.com/ESMValGroup/ESMValCore/tree/main/notebooks.

 

ESMValCore has been designed for use on computing systems that are typically used by researchers: it works well on a laptop or desktop computer, but also comes with example configuration files for use on large compute clusters attached to ESGF nodes. For reliable computations, ESMValCore makes use of the Iris library developed by the UK Met Office. This in turn is built on top of Dask, a library for efficient parallel computations with a low memory footprint. In 2023, we aim to improve our use of Dask in collaboration with the Iris developers, for even better computational performance.

 

For easy reproducibility, ESMValCore also offers “recipes” in which standard analyses can be saved. A large collection of such recipes is available in the Earth System Model Evaluation Tool (ESMValTool). ESMValTool started out as a set of community-developed diagnostics and performance metrics for the evaluation of Earth system models. Recently it has also turned out to be useful for other users of climate data, such as hydrologists and climate change impact researchers. Both ESMValCore and ESMValTool are developed by and for researchers working with climate data, with the support of several research software engineers. An important recent achievement is the use of these packages to produce the figures for several chapters of the IPCC AR6 report. Documentation for both ESMValCore and ESMValTool is available at https://docs.esmvaltool.org.

 

How to cite: Andela, B., Kalverla, P., Kazeroni, R., Loosveldt Tomas, S., Predoi, V., Schlund, M., Smeets, S., and Zimmermann, K.: User-friendly climate data discovery and analysis with ESMValCore, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9782, https://doi.org/10.5194/egusphere-egu23-9782, 2023.

14:34–15:45
Chairpersons: Kaylin Bugbee, Ionut Cosmin Sandric, Christopher Kadow
16:15–16:17
|
PICO2.1
|
EGU23-11204
|
On-site presentation
Tamrat Belayneh

Indexed 3D Scene Layers (I3S), an OGC Community Standard for streaming and storing massive amounts of geospatial content has been rapidly evolving to capture new use cases and techniques to advance geospatial visualization and analysis. As an OGC Community Standard, I3S has been evolving over the last 4 years adopting new use cases and capability. The current version of OGC I3S 1.3 adopted in Dec. 2022 enables efficient transmission of various 3D geospatial data types including discrete 3D objects with attributes, integrated surface meshes and point cloud data covering vast geographic areas as well as highly detailed BIM (Building Information Model) content, to web browsers, mobile apps and desktop.

As an open standard, I3S has been embraced by the Free and Open-Source Software (FOSS) Community for streaming massive 3D geospatial content. Enabling composition of 3d geospatial content covering different disciplines and use cases is the strength of I3S. In this paper, we’ll describe and demonstrate, including via applicable code snippets and sandcastles, I3S consumption in popular web-based 3D visualization applications such as loaders.gl and CesiumJS. 

How to cite: Belayneh, T.: I3S, an OGC 3D Streaming Standard Enabling Geospatial Interoperability and Composability, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11204, https://doi.org/10.5194/egusphere-egu23-11204, 2023.

16:17–16:19
|
PICO2.2
|
EGU23-10166
|
On-site presentation
Thomas Knudsen

Rust Geodesy (RG), is an open source platform for experiments with geodetic software, transformations, and standards. RG vaguely resembles the well-known open source PROJ transformation system, and was built on the basis of experiments with alternative data flow models for PROJ. The actual transformation functionality of RG is, however, minimal: At time of writing, it includes just a few low level operations, including:

  • The three, six, seven, and fourteen-parameter versions of the Helmert transformation
  • Horizontal and vertical grid shift operations
  • Helmert's companion, the cartesian/geographic coordinate conversion
  • The full and abridged versions of the Molodensky transformation
  • Three widely used conformal projections: The Mercator, the Transverse Mercator, and the Lambert Conformal Conic projection
  • The Adapt operator, which mediates between various conventions for coordinate units and axis order
  • Also, RG provides access to a large number of primitives from geometrical geodesy, all wrapped as methods on a data model unifying the representation of two- and three-axis ellipsoids.

While this is sufficient to test the architecture, and while supporting the most important transformation primitives and three of the most used map projections makes it surprisingly useful, it is a far cry from PROJ's enormous gamut of supported map projections: RG is a platform for experiments, not for operational setups.

Fundamentally RG is a geodesy, rather than cartography library. And while PROJ benefits from four decades of reality hardening, RG, being a platform for experiments, does not even consider development in the direction of operational robustness. Hence, viewing RG as a PROJ replacement, will lead to bad disappointment.

That said, being written in Rust, with all the memory safety guarantees Rust provides, RG by design avoids a number of pitfalls that are explicitly worked around in the PROJ code base, so the miniscule size of RG (as measured in number of code lines) compared to PROJ, is not just a matter of functional pruning, but also a matter of development using a tool wonderfully suited for the task at hand.

Also, having the advantage of learning from PROJ experience, both from a user's and a developer's perspective, RG is significantly more extensible than PROJ, so perhaps for a number of applications, and despite its limitations, RG may be sufficient, and perhaps even useful. First and foremost, however, RG may be a vehicle for geodetic development work, eventually feeding new functionality and new transformations into the PROJ ecosystem.

Aims

Dataflow experimentation is just one aspect of RG. Overall, the aims are fourfold:

  • Support experiments for evolution of geodetic standards.
  • Support development of geodetic transformations.
  • Hence, provide easy access to a number of basic geodetic operations, not limited to coordinate operations.
  • Support experiments with data flow and alternative abstractions. Mostly as a tool for the other 3 aims

All four aims are guided by a wish to amend explicitly identified shortcomings in the existing geodetic system landscape.

How to cite: Knudsen, T.: Rust Geodesy: a new platform for experiments with geodetic software, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10166, https://doi.org/10.5194/egusphere-egu23-10166, 2023.

16:19–16:21
|
PICO2.3
|
EGU23-10496
|
ECS
|
On-site presentation
|
H. Sherry Zhang, Dianne Cook, Patricia Menendez, Ursula Laa, and Nicolas Langrene

Indexes are commonly used to combine multivariate information into a single number for monitoring, communicating, and decision-making. They are applied in many areas including the environment (e.g. drought index, Southern Oscillation Index), and the economy (e.g. Consumer Price Index, FTSE). Developers, analysts, and policymakers tend to have their favorite indexes, but there is little transparency about their performance. Indexes are used like black boxes---raw data is entered and a single number is returned---with scarce attention paid to diagnostics. Interestingly, though, all indexes can be constructed using a data pipeline in a series of well-defined steps, regardless of their origin. This talk will explain this, and how you can use this structure to inspect the behavior of indexes in different scenarios. Our work coordinates the vast array of index research and development into a simple set of building blocks. This modular data pipeline is implemented in an R package, which contains some standard indexes, and allows others to be easily coded. We will illustrate the benefits of this framework using the drought index, and show how different versions, created by different parameter choices, can lead to potentially varied decisions. 

How to cite: Zhang, H. S., Cook, D., Menendez, P., Laa, U., and Langrene, N.: Index construction: a pipeline approach for transparency and diagnostics, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10496, https://doi.org/10.5194/egusphere-egu23-10496, 2023.

16:21–16:23
|
PICO2.4
|
EGU23-12167
|
On-site presentation
Marcus Strobl, Elnaz Azmi, Alexander Dolich, Sibylle K. Hassler, Mirko Mälicke, Ashish Manoj J, Jörg Meyer, Achim Streit, and Erwin Zehe

The amount of digitally available environmental data and methods to process these data is continuously increasing. With the DFG project ISABEL, we build on the existing virtual research environment V-FOR-WaTer to support making this data abundance available in an easy-to-use web portal, foster data publications, and facilitate data analyses. Environmental scientists get access to data from different sources, e.g. state offices or university projects, and can share their own data and tools through the portal. Already integrated tools help to easily pre-process and scale the data and make them available in a consistent format.

V-FOR-WaTer already contains many of the necessary functionalities to provide and display data from various sources and disciplines. The detailed metadata scheme is adapted to water and terrestrial environmental data. Present datasets in the web portal originate from university projects and state offices. A connection of V-FOR-WaTer to the GFZ Data Services, an established repository for geoscientific data, will ease publication of data from the portal and in turn provides access to datasets stored in this repository. Key to being compatible with GFZ Data Services and other systems is the compliance of the metadata scheme with international standards (INSPIRE, ISO19115).

The web portal is designed to facilitate typical workflows in environmental sciences. Map operations and filter options ensure easy selection of the data, while the workspace area provides tools for data pre-processing, scaling, and common hydrological applications. The toolbox also contains more specific tools, e.g. for geostatistics and for evapotranspiration. It is easily extendable and will ultimately include user-developed tools, reflecting the current research topics and methodologies in the hydrology community. Tools are accessed through Web Processing Services (WPS) and can be joined, saved and shared as workflows, enabling complex analyses and ensuring reproducibility of the results.

How to cite: Strobl, M., Azmi, E., Dolich, A., Hassler, S. K., Mälicke, M., Manoj J, A., Meyer, J., Streit, A., and Zehe, E.: V-FOR-WaTer goes ISABEL - Current developments in the V-FOR-WaTer Web Portal, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12167, https://doi.org/10.5194/egusphere-egu23-12167, 2023.

16:23–16:25
|
PICO2.5
|
EGU23-13285
|
ECS
|
On-site presentation
Christopher Kadow, Etor E. Lucio-Eceiza, Martin Bergemann, Andrej fast, Hannes Thiemann, and Thomas Ludwig

Freva (the Free Evaluation System Framework [1; 2]) is a platform developed by the earth science community for the earth science community. Designed to work over HPC environments, it efficiently handles the data search and analysis of large projects, institutes or universities. Written on python, the framework has undergone a major update of the core. Freva offers:

  • A centralized access. Freva comes in three different flavours with similar functionalities: a command line interface, a web user interface, and a python module that allows the usage of Freva in python environments, like jupyter notebooks.
  • A standardized data search. Freva allows for a quick and intuitive search of several datasets stored centrally. The datasets are internally indexed in a SOLR server with an implemented metadata system that satisfies the international standards provided by the Earth System Grid Federation.
  • Flexible analysis. Freva provides a common interface for user defined data analysis tools to plug them in to the system irrespective of the used language. Each plugin can be encapsulated in a personalized conda environment, facilitating the reproducibility and portability to any other Freva instance. These plugins are able to search from and integrate own results back to the database, enabling an ecosystem of different tools. This environment fosters the interchange of results and ideas between researchers, and the collaboration between users and plugin developers alike.
  • Transparent and reproducible results. The analysis history and parameter configuration (including tool and system Git versioning) of every plugin run is stored in a MariaDB database. Any analysis configuration and result can be consulted and shared among the scientists, offering traceability in line with FAIR data principles, and optimizing the usage of computational and storage resources.

 

Freva has also experienced an upgrade on the sysadmin side:

  • Painless deployment via Ansible, with a highly customizable configuration of the services via Docker.
  • Secure system configuration via Vault integration.
  • Straightforward migration from old Freva database servers or between Freva instances.
  • Improvements in the dataset incorporation.
  • Automatic backup of database and SOLR services.

[1] https://www.freva.dkrz.de/
[2] https://github.com/FREVA-CLINT/freva-deployment

How to cite: Kadow, C., Lucio-Eceiza, E. E., Bergemann, M., fast, A., Thiemann, H., and Ludwig, T.: Freva is dead, long live Freva! New features of a software framework for the Earth System community, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13285, https://doi.org/10.5194/egusphere-egu23-13285, 2023.

16:25–16:27
|
PICO2.6
|
EGU23-13429
|
ECS
|
On-site presentation
Oriol Tinto and Robert Redl

Data storage is a critical challenge in science as the amount of data being generated continues to increase. Geosciences and weather are not an exception. To address this challenge, data reduction techniques are required. Even though lossless compression methods might sound ideal, they don't usually work that well when it comes to compressing geoscience data because this data often has a lot of uncertainty resulting in random bits, which makes it hard to compress. Lossy compression methods can do a better job compressing geoscience data by getting rid of the insignificant details that make it hard to compress. Hopefully, research groups have been working on this problem for years and have published open source tools that can provide high compression ratios with acceptable error levels. However, these methods have not yet been widely adopted in geosciences due to concerns about the loss of information that they entail. By combining these best open source lossy compressors into a single easy-to-use package and showing its effectiveness through its use in several weather applications, we have created a powerful and user-friendly open-source tool that effectively helps reduce data storage needs while preserving scientific conclusions.

How to cite: Tinto, O. and Redl, R.: Effective Data Compression for Geosciences: An open-source solution to combat data storage challenges, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13429, https://doi.org/10.5194/egusphere-egu23-13429, 2023.

16:27–16:29
|
PICO2.7
|
EGU23-13778
|
ECS
|
On-site presentation
|
Chloé Radice, Hélène Brogniez, Pierre-Emmanuel Kirstetter, and Philippe Chambon

Assessing model forecasts using remote sensing data is often and generally done by confronting past simulations to observations. We developed a novel probabilistic comparison method that evaluates tropical atmospheric relative humidity profiles simulated by the global numerical model for weather forecasting ARPEGE (Météo France) using probability density functions of finer scale satellite observations as reference.

The global relative humidity field is simulated by ARPEGE every 6 hours on a 0.25 degree grid over 18 vertical levels ranging from 100 hPa to 950 hPa. The reference relative humidities are retrieved from brightness temperatures measured by SAPHIR, the passive microwave sounder onboard satellite Megha-Tropiques. SAPHIR has a footprint resolution ranging from 10 km at nadir to 23 km at the edge of the swath, with a vertical resolution of 6 vertical pressure layers (also from 100 hPa to 950 hPa).  Due to the particular orbit of the satellite, each point of the Tropical belt is observed multiple times per day. 

Footprint scale RH probability density functions are aggregated (convoluted) over the spatial and temporal scale of comparison to match the model resolution and summarize the patterns over a significant period. This method allows to use more sub-grid information by considering the finer-scale distributions as a whole. Thisprobabilistic approach avoids the classical determinist simplification consisting of working with a simple ”best” estimate. The resulting assessment is more contrasted while better adapted to the characterization of specific situations on a case-by-case study. It provides a significant added-value to the classical deterministic comparisons by accounting for additional information in the evaluation of the simulated field, especially for model simulations that are close to the traditional mean.

Comparison results will be shown over the April-May-June 2018 period for two configurations of the ARPEGE model (two parametrization schemes for convection). The probabilistic comparison is discussed with respect to a classical deterministic comparison of RH values.

How to cite: Radice, C., Brogniez, H., Kirstetter, P.-E., and Chambon, P.: Using satellite probabilistic estimates to assess modeled relative humidity : application to a NWP model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13778, https://doi.org/10.5194/egusphere-egu23-13778, 2023.

16:29–16:31
|
PICO2.8
|
EGU23-14302
|
On-site presentation
Bart Schilperoort, Peter Kalverla, Barbara Vreede, Stefan Verhoeven, Fakhereh Alidoost, Yang Liu, Niels Drost, Jerom Aerts, and Rolf Hut

The ERA5 meteorological reanalysis dataset, from the European Centre for Medium-Range Weather Forecasts (ECMWF), is widely used in areas such as meteorology, hydrology and land-surface modelling. The Copernicus Climate Data Store (CDS) offers two options for accessing the data: a web interface and a Python API. However, automated downloading of the data requires advanced knowledge of Python, and can prove challenging to people less familiar with programming.

Many climate scientists have their own Python scripts to download data from the CDS, all responsible for their own creation and maintenance. A quick search for Python scripts that call the CDS API on GitHub yields 1802 results, and this is not even counting scripts stored privately. However, these are by and large not reusable. A few years ago we created era5cli, as a byproduct of a project we were working on, to try to break this pattern of single-use scripts. era5cli enables automated downloading of ERA5 data using a single command.

It is inefficient that everyone writes their own copy of the same, or at least similar code. That why we asked ourselves whether era5cli is still filling a niche and if so, what we could do to make it easier to re-use for others. In this presentation we give an overview of our recent efforts into turning era5cli from a utility script into a reusable software package.

Despite the relatively small size of era5cli, around 1000 lines of Python code and comments, maintenance is not trivial. Changes are occasionally made to ERA5 and the CDS, and new Python versions are released while old ones are deprecated. Users of era5cli have helped here by submitting fixes to issues they have found in a Github pull request, but still require guidance and/or approval of administrators. By reducing the maintenance load of era5cli, through targeted streamlining of the code and a clean-up of the repository, as well as adding to the developer instructions in the documentation, we lower the threshold for community contributions and successful future maintenance. With this, we aim to make era5cli future-proof.

era5cli can be installed using Python’s pip, as well as using conda/mamba (conda install era5cli -c conda-forge). The source code for era5cli is available on https://github.com/eWaterCycle/era5cli, and the documentation can be found on https://era5cli.readthedocs.io/.

How to cite: Schilperoort, B., Kalverla, P., Vreede, B., Verhoeven, S., Alidoost, F., Liu, Y., Drost, N., Aerts, J., and Hut, R.: era5cli: from a utility script to a reusable software package, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14302, https://doi.org/10.5194/egusphere-egu23-14302, 2023.

16:31–16:33
|
PICO2.9
|
EGU23-14791
|
ECS
|
On-site presentation
Mateusz Zawadzki and Marijke Huysmans

Data is at the heart of every project, and its quality determines the reliability of research outcomes. In ever more collaborative geoscientific research, where data comes from many sources and in various formats, the right tools must be used to ensure data integrity and seamless access for all involved.

Since 2019, in the project Grow, we routinely monitored groundwater quality and levels within an agricultural field where water is reused for irrigation and groundwater recharge. With multiple participants and analysis factors and fine temporal resolution of the monitoring, problems were encountered with a large volume of unstructured data with poor version control. Standard tools for collaborative research, such as cloud-based Excel spreadsheets, proved ineffective and threatened data integrity. A more robust data management system was urgently needed.

Here we provide an overview of a framework based on a popular, open-source PostgreSQL relational database management system deployed in Amazon Web Services that helps to overcome data management issues in groundwater monitoring projects. Among main features are user-based, minimum privilege access which protects the data from, e.g., accidental deletions, and a hardcoded set of data correctness checks decreasing the likeliness of data input errors. Field data is collected using QField Cloud mobile application running a preconfigured QGIS project, sending the data directly to the database. Users also have access to all historical records, helping them detect anomalies on the spot. Laboratory analysis results and data from automatic data loggers without Internet of Things (IoT) modules are processed and uploaded to the database using custom-developed, open-source Python software, providing full transparency. Several IoT devices upload data directly to the database.

So far, the new management system has proved a far superior platform for collaborative data analysis compared to existing tools. It significantly improved fieldwork efficiency and provided assurance of the data quality by improving the collection and handling process transparency. Thanks to the hard work of the QGIS and QField communities, as well as the developers and maintainers of PostgreSQL, we are better equipped for the future of geodata analysis.

How to cite: Zawadzki, M. and Huysmans, M.: How PostgreSQL and QField Cloud can streamline data collection and improve data security: experience from a collaborative, interdisciplinary water reuse project in Flanders, Belgium., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14791, https://doi.org/10.5194/egusphere-egu23-14791, 2023.

16:33–16:35
|
PICO2.10
|
EGU23-15104
|
On-site presentation
Lise Seland Graff, Kajsa M. Parding, and Oskar A. Landgren

When investigating multi-model ensembles it can be useful to evaluate model performance to make sure that the historical climate of the fields of interest is captured to a satisfactory degree. To this end we define a simple climatology score, based on the root-mean-square error (RMSE) of essential climate variables from the historical experiments of an ensemble of models participating in the sixth Coupled Model Intercomparison Project (CMIP6). 

We consider four key variables: near-surface temperature, precipitation, 850-hPa zonal wind, and 850-hPa air temperature. The focus is on monthly climatologies of global values, but we also explore the sensitivity of the scores to changes in the regions and seasons considered. 

The purpose of the score is to help identify models with relatively large errors in the representation of the variables of interest. This can be useful when considering models to include for storyline analysis or when selecting a subset of models for regional downscaling.

How to cite: Graff, L. S., Parding, K. M., and Landgren, O. A.: Evaluating CMIP6 models with a simple  climatology score, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15104, https://doi.org/10.5194/egusphere-egu23-15104, 2023.

16:35–16:37
|
PICO2.11
|
EGU23-15272
|
On-site presentation
Klaus Zimmermann, Lars Bärring, Joakim Löw, and Carolina Nilsson

Climate indices have long been an important tool for the evaluation of climate impacts and are commonly used both in research and practical risk assessment. The term generally refers to advanced statistics derived from daily data, such as the longest spell of consecutive dry days in a year, or the maximal precipitation in a single day in a given month. Initially, these indices were calculated for station data, but with the advent and increased utility of global and regional climate models, they are now commonly calculated for gridded data. The increased spatial resolution, together with the increased length of simulations to hundreds of years, the increased use of ever-larger ensembles of climate simulations, and the interest in a wider selection of possible future climate scenarios renders some established, serial algorithms ineffective. This is compounded by the fact that modern computing architectures derive their growing power no longer from the speedup of single computing units, but rather from the integration of larger numbers of parallel computing units. To fully utilize this potential, it is not enough to implement a straightforward parallelization of existing algorithms. Rather, we need to rethink the computing task from the start in a parallel framework.

Here, we present parallel algorithms implemented in the Python framework Climix, that have proven useful in the calculation of climate indices for a large ensemble of climate simulations that provide the basis for the user-oriented climate service of the Swedish Meteorological and Hydrological Institute.

Climix is available as open-source software and allows the calculation of a large number of climate indices both from a command-line interface with good support for common HPC schedulers such as SLURM and via a flexible Python API.

How to cite: Zimmermann, K., Bärring, L., Löw, J., and Nilsson, C.: Climix—a flexible suite for the calculation of climate indices, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15272, https://doi.org/10.5194/egusphere-egu23-15272, 2023.

16:37–16:39
|
PICO2.12
|
EGU23-15399
|
ECS
|
Virtual presentation
|
Viktoria Wichert, Holger Brix, and Doris Dransch

Fluvial extreme events, such as floods and droughts, have an impact beyond the river bed. The change in river discharge and concentration of nutrients and pollutants in freshwater also affects coastal waters, esp. their biogeochemistry. Examining these impacts has been traditionally difficult, as one needs to first detect the river plume in the seawater and then infer its spatio-temporal extent. The River Plume Workflow was developed to support researchers with these tasks and enable them to identify regions of interest, as well as provide tools to conduct a preliminary analysis of the riverine extreme events’ impacts on the coastal waters.

The Riverplume Workflow is an open source software tool to detect and examine freshwater signals as anomalies in marine observational data. Data from a FerryBox, an autonomous measuring device installed on a commercial ferry, provide regular coverage of the German Bight, the region for which we developed this toolbox. Combined with drift model computations, it is possible to detect anomalies in the observational data and to comprehend their propagation and origin.

The Riverplume Workflow uses the Data Analytics Software Framework (DASF) that was developed as part of the Digital Earth project. Through its modular structure, DASF supports collaborative and distributed data analysis. The Riverplume Workflow’s main feature is an interactive map with various data visualization options that allows users to examine the data closely and either manually select a presumed anomaly for analysis or use an automatic anomaly detection algorithm based on Gaussian regression. The Workflow offers a statistical analysis feature to compare the composition of the selected data to the surrounding measurements. Simulated trajectories of particles starting on the FerryBox transect at the time of the original observation and modelled backwards and forwards in time help verify the origin of the river plume and allow users to follow the anomaly across their area of interest. In addition, the workflow offers the functionality to assemble satellite-based chlorophyll observations along model trajectories as a time series. They allow scientists to understand processes inside the river plume and to determine the timescales on which these developments happen.

The FerryBox data used in the Riverplume Workflow are pre-processed automatically and updated daily. Synoptic drift model data is provided for all Elbe extreme events since 2013. We plan to automatize the provision of model data as well.

We currently use the Riverplume Workflow to monitor the impacts of Elbe extreme events in the German Bight, though we plan to adapt it to other regions or types of anomalies. The Workflows’ code and all components are available under open source licenses and registered under the DOI https://doi.org/10.5880/GFZ.1.4.2022.006.

How to cite: Wichert, V., Brix, H., and Dransch, D.: The Riverplume Workflow - Impact of riverine extreme events on coastal biogeochemistry, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15399, https://doi.org/10.5194/egusphere-egu23-15399, 2023.

16:39–18:00