ESSI3.7

EDI
Free and Open Source Software (FOSS) and Cloud-based Technologies to Facilitate Collaborative Science

Earth science research has become increasingly collaborative through shared code and shared platforms. Researchers work together on data, software and algorithms to answer cutting-edge research questions. Teams also share these data and software with other collaborators to refine and improve these products. As data volumes continue to grow, researchers will need new platforms to both enable analysis at scale and to support the sharing of data and software.

Software is critical to the success of science. Creating and using Free and Open Source Software (FOSS) fosters contributions from the scientific community, creates a peer-reviewed and consensus-oriented environment, and promotes the sustainability of science infrastructures.

This session will look at how Free and Open Source Software (FOSS) and cloud-based architecture solutions support information sharing, scientific collaboration, scientific reproducibility and solutions that enable large-scale data analytics.

Co-organized by GI2, co-sponsored by AGU
Convener: Jens Klump | Co-conveners: Kaylin BugbeeECSECS, Horst Schwichtenberg, Anusuriya Devaraju, Wim Som de Cerff
vPICO presentations
| Wed, 28 Apr, 13:30–15:00 (CEST)

vPICO presentations: Wed, 28 Apr

Chairpersons: Jens Klump, Kaylin Bugbee, Horst Schwichtenberg
13:30–13:35
Virtual Research Environments
13:35–13:37
|
EGU21-1614
|
ECS
Philipp S. Sommer, Viktoria Wichert, Daniel Eggert, Tilman Dinter, Klaus Getzlaff, Andreas Lehmann, Christian Werner, Brenner Silva, Lennart Schmidt, and Angela Schäfer

A common challenge for projects with multiple involved research institutes is a well-defined and productive collaboration. All parties measure and analyze different aspects, depend on each other, share common methods, and exchange the latest results, findings, and data. Today this exchange is often impeded by a lack of ready access to shared computing and storage resources. In our talk, we present a new and innovative remote procedure call (RPC) framework. We focus on a distributed setup, where project partners do not necessarily work at the same institute, and do not have access to each others resources.

We present the prototype of an application programming interface (API) developed in Python that enables scientists to collaboratively explore and analyze sets of distributed data. It offers the functionality to request remote data through a comfortable interface, and to share and invoke single computational methods or even entire analytical workflows and their results. The prototype enables researchers to make their methods accessible as a backend module running on their own infrastructure. Hence researchers from other institutes may apply the available methods through a lightweight python or Javascript API. This API transforms standard python calls into requests to the backend process on the remote server. In the end, the overhead for both, the backend developer and the remote user, is very low. The effort of implementing the necessary workflow and API usage equalizes the writing of code in a non-distributed setup. Besides that, data do not have to be downloaded locally, the analysis can be executed “close to the data” while using the institutional infrastructure where the eligible data set is stored.

With our prototype, we demonstrate distributed data access and analysis workflows across institutional borders to enable effective scientific collaboration, thus deepening our understanding of the Earth system.

This framework has been developed in a joint effort of the DataHub and Digitial Earth initiatives within the Research Centers of the Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.  (Helmholtz Association of German Research Centres, HGF).

How to cite: Sommer, P. S., Wichert, V., Eggert, D., Dinter, T., Getzlaff, K., Lehmann, A., Werner, C., Silva, B., Schmidt, L., and Schäfer, A.: A new distributed data analysis framework for better scientific collaborations, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1614, https://doi.org/10.5194/egusphere-egu21-1614, 2021.

13:37–13:39
|
EGU21-4432
Vincent Fazio, Carsten Friedrich, Rini Angreani, Pavel Golodoniuc, John Hille, Alex Hunt, LingBo Jiang, Jens Klump, Geoffrey Squire, Peter Warren, Ulrich Engelke, Stuart Woodman, and Sam Bradley

As open source geospatial mapping toolkits and platforms continue to develop and mature, the developers of web portals using these solutions need to regularly review and revaluate their technology choices in order to stay up to date and provide the best possible experience and functionality to their users. We are currently undergoing such a refresh with our AuScope Discovery Portal, Virtual Geophysics Laboratory, and the AuScope 3D Geological Models Portal. The task of deciding which solutions to utilise as part of the upgrade process is not to be underestimated. Our main evaluation criteria include the ability to support commonly used map layer formats and web service protocols, support for 3D display capabilities, community size and activity, ease of adding custom display and scientific workflow / processing widgets, cost and benefits of integration with existing components and maintainability into the future. We are beginning a journey to update and integrate our portals’ functionality and will outline the decision process and conclusions of our investigations as well as the detailed evaluation of web based geospatial solutions against our functional and operational criteria.

How to cite: Fazio, V., Friedrich, C., Angreani, R., Golodoniuc, P., Hille, J., Hunt, A., Jiang, L., Klump, J., Squire, G., Warren, P., Engelke, U., Woodman, S., and Bradley, S.: Assembling a geoscience information portal from pieces of the open source software jigsaw puzzle, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4432, https://doi.org/10.5194/egusphere-egu21-4432, 2021.

13:39–13:41
|
EGU21-6164
Mathias Bavay, Michael Reisecker, Thomas Egger, and Daniela Korhammer

As numerical model developers, we have experienced first hand how most users struggle with the configuration of the models, leading to numerous support requests. Such issues are usually mitigated by offering a Graphical User Interface (GUI) that flattens the learning curve. This requires however a significant investment for the model developer as well as a specific skill set. Moreover, this does not fit with the daily duties of model developers. As a consequence, when a GUI has been created -- usually within a specific project and often relying on an intern -- the maintenance either constitutes a major burden or is not performed. This also tends to limit the evolution of the numerical models themselves, since the model developers try to avoid having to change the GUI.

To circumvent that problem, we have developed Inishell [1], a C++/Qt application based on an XML description of the inputs required by the numerical model that generates a GUI on the fly. This makes maintenance of the GUI very simple and enables users to easily get an up-to-date GUI for configuring the numerical model. The first version of this tool was written almost ten years ago and showed that the concept works very well for our own surface processes models. A full rewrite offering a more modern interface and extended capabilities is presented here.

 

[1] Bavay, M., Reisecker, M., Egger, T., and Korhammer, D., “Inishell 2.0: Semantically driven automatic GUI generation for scientific models”, Geosci. Model Dev. Discuss. [preprint], https://doi.org/10.5194/gmd-2020-339, in review, 2020.

How to cite: Bavay, M., Reisecker, M., Egger, T., and Korhammer, D.: Inishell 2.0: Semantically driven automatic GUI generation for scientific models, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6164, https://doi.org/10.5194/egusphere-egu21-6164, 2021.

Free and Open Source Software
13:41–13:43
|
EGU21-9408
|
ECS
Armin Hahn, Wiard Frühling, and Jan Schlüter

Routing on a road network requires geographical points on the road network that correspond best to the addresses of the given origin and destination, here called snapping points. The technique to determine such snapping points is also called offline map matching. Conventional routing machines use the shortest perpendicular distance from a building’s centroid to the road network for this purpose. However, in some cases, this technique leads to suboptimal results when the access to a building is not reachable from the road segment with the shortest perpendicular distance. We used open-source data — multispectral images, OpenStreetMap data, Light Detection and Ranging (LiDAR) data — to perform a cost-distance analysis and determined the most likely access to buildings. Therefore, we assumed that the path to the building shows less vegetation cover, minimal slope of the terrain and avoids building footprints. Our results are validated based on a predetermined Ideal Snapping Area for different weightings of the parameters vegetation, slope und building footprints. We also compared our results with a conventional routing machine (Open Source Route Machine - ) that uses the perpendicular distance. The validation-rate of our approach is up to 90%, depending on the weighting of chosen parameters, whereas the conventional routing machine shows a validation-rate of 81%. The optimized snapping points can be used to determine enhanced stop locations in passenger transport to improve services such as door-to-door transportation (e.g. demand-responsive transport).

How to cite: Hahn, A., Frühling, W., and Schlüter, J.: Using open-source high resolution remote sensing data to determine the access to buildings in the context of passenger transport, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9408, https://doi.org/10.5194/egusphere-egu21-9408, 2021.

13:43–13:45
|
EGU21-9602
Edzer Pebesma, Patrick Griffiths, Christian Briese, Alexander Jacob, Anze Skerlevaj, Jeroen Dries, Gilberto Camara, and Matthias Mohr

The OpenEO API allows the analysis of large amounts of Earth Observation data using a high-level abstraction of data and processes. Rather than focusing on the management of virtual machines and millions of imagery files, it allows to create jobs that take a spatio-temporal section of an image collection (such as Sentinel L2A), and treat it as a data cube. Processes iterate or aggregate over pixels, spatial areas, spectral bands, or time series, while working at arbitrary spatial resolution. This pattern, pioneered by Google Earth Engine™ (GEE), lets the user focus on the science rather than on data management.

The openEO H2020 project (2017-2020) has developed the API as well as an ecosystem of software around it, including clients (JavaScript, Python, R, QGIS, browser-based), back-ends that translate API calls into existing image analysis or GIS software or services (for Sentinel Hub, WCPS, Open Data Cube, GRASS GIS, GeoTrellis/GeoPySpark, and GEE) as well as a hub that allows querying and searching openEO providers for their capabilities and datasets. The project demonstrated this software in a number of use cases, where identical processing instructions were sent to different implementations, allowing comparison of returned results.

A follow-up, ESA-funded project “openEO Platform” realizes the API and progresses the software ecosystem into operational services and applications that are accessible to everyone, that involve federated deployment (using the clouds managed by EODC, Terrascope, CreoDIAS and EuroDataCube), that will provide payment models (“pay per compute job”) conceived and implemented following the user community needs and that will use the EOSC (European Open Science Cloud) marketplace for dissemination and authentication. A wide range of large-scale cases studies will demonstrate the ability of the openEO Platform to scale to large data volumes.  The case studies to be addressed include on-demand ARD generation for SAR and multi-spectral data, agricultural demonstrators like crop type and condition monitoring, forestry services like near real time forest damage assessment as well as canopy cover mapping, environmental hazard monitoring of floods and air pollution as well as security applications in terms of vessel detection in the mediterranean sea.

While the landscape of cloud-based EO platforms and services has matured and diversified over the past decade, we believe there are strong advantages for scientists and government agencies to adopt the openEO approach. Beyond the absence of vendor/platform lock-in or EULA’s we mention the abilities to (i) run arbitrary user code (e.g. written in R or Python) close to the data, (ii) carry out scientific computations on an entirely open source software stack, (iii) integrate different platforms (e.g., different cloud providers offering different datasets), and (iv) help create and extend this software ecosystem. openEO uses the OpenAPI standard, aligns with modern OGC API standards, and uses the STAC (SpatioTemporal Asset Catalog) to describe image collections and image tiles.

How to cite: Pebesma, E., Griffiths, P., Briese, C., Jacob, A., Skerlevaj, A., Dries, J., Camara, G., and Mohr, M.: Analyzing large-scale Earth Observation data repositories made simple with OpenEO Platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9602, https://doi.org/10.5194/egusphere-egu21-9602, 2021.

13:45–13:47
|
EGU21-9632
|
ECS
Thomas Seidler, Norbert Schultz, Dr. Markus Quade, Christian Autermann, Dr. Benedikt Gräler, and PD Dr. Markus Abel

Earth system modeling is virtually impossible without dedicated data analysis. Typically, data are big and due to the complexity of the system, adequate tools for the analysis lie in the domain of machine learning or artificial intelligence. However, earth system specialists have other expertise than developing and deploying state-of-the art programming code which is needed to efficiently use modern software frameworks and computing resources. In addition, Cloud and HPC infrastructure are frequently needed to run analyses with data beyond Tera- or even Petascale volume, and corresponding requirements on available RAM, GPU and CPU sizes. 

Inside the KI:STE project (www.kiste-project.de), we extend the concepts of an existing project, the Mantik-platform (www.mantik.ai), such that handling of data and algorithms is facilitated for earth system analyses while abstracting technical challenges such as scheduling and monitoring of training jobs and platform specific configurations away from the user.

The principles for design are collaboration and reproducibility of algorithms from the first data load to the deployment of a model to a cluster infrastructure. In addition to the executive part where code is developed and deployed, the KI:STE project develops a learning platform where dedicated topics in relation to earth system science are systematically and pedagogically presented.

In this presentation, we show the architecture and interfaces of the KI:STE platform together with a simple example.

How to cite: Seidler, T., Schultz, N., Quade, Dr. M., Autermann, C., Gräler, Dr. B., and Abel, P. Dr. M.: Easing and promoting the application of ML and AI in earth system sciences - introducing the KI:STE platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9632, https://doi.org/10.5194/egusphere-egu21-9632, 2021.

13:47–13:49
|
EGU21-12467
Milan Antonovic, Massimiliano Cannata, Nils Oesterling, and Sabine Brodhag

Most of the time boreholes data, particularly those collected in the past, are in the form of static data reports that describe the stratigraphy and the related characteristics; these data types are generally available as paper documents, or static files like .pdf of images (.ai). While very informative, these documents are not searchable, not interoperable nor easily reusable, since they require a non negligible time for data integration. Sometime, data are archived into database. This certainly improve the find-ability of the data and its accessibility but still do not address the interoperability requirement and therefore, combining data from different sources remain a problematic task. To enable FAIR borehole data and facilitate the different entities (public or private) management Swisstopo (www.swisstopo.ch) has funded the development of a Web application named Borehole Data Management System (BDMS) [1] that adopt the borehole data model () [2] implemented by the Swiss Geological Survey. From the first beta release (2019) several improvements to the platform has been implemented leading to the last official release of the platform (v1.0.2) officially available on www.swissforages.ch. The latest released features includes:

  • Borehole document storage
  • Interface customization
  • Improved access & authorization managemnt
  • External WMS/WMTS background map support
  • User feedbacks form
  • Handling of personalized and versioned terms of service
  • Enhanced bulk data import
  • Minor enhancements and bug fixes

 

How to cite: Antonovic, M., Cannata, M., Oesterling, N., and Brodhag, S.: Swissforages: the Free and Open-Source Borehole Data Management System, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12467, https://doi.org/10.5194/egusphere-egu21-12467, 2021.

13:49–13:51
|
EGU21-4031
|
ECS
Giacomo Nodjoumi, Luca Guallini, Roberto Orosei, Luca Penasa, and Angelo Pio Rossi

The objective of this work is to present a new Free and Open-Source Software (FOSS) to read and convert to multiple data formats data acquired by the Mars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS) instrument on board Mars Express (MEX) orbiting Mars since 2005.

MARSIS is an orbital synthetic aperture radar sounder that operates with dual-frequency between 1.3 and 5.5 MHz and wavelengths between 230 and 55 m for subsurface sounding. The Experiment Data Record (EDR) and Reduced Data Record (RDR) datasets are available for download on public access platforms such as the Planetary Science Archive fo ESA and the PDS-NASA Orbital Data Explorer (ODE).

These datasets have been widely used for different research, focused to study the subsurface of the red planet up to a depth of a few kilometres, and especially for studying ice caps and looking for subsurface ice and water deposits, producing relevant results. (Lauro et al., 2020; Orosei et al., 2020)

The Python tool presented here is capable of reading common data types used to distribute MARSIS dataset and then converting into multiple data formats. Users can interactively configure data source, destination, pre-processing and type of outputs among:

  • Geopackages: for GIS software, is a single self-contained file containing a layer in which are stored all parameters for each file processed.
  • Numpy array dump: for fast reading and analysis of original data for both frequencies.
  • PNG images: for fast inspections, created for each frequency, and saved. Image pre-processing filters, such as image-denoising, standardization and normalization, can be selected by user.
  • SEG-Y: for analysing data with seismic interpretation and processing software, see e.g. OpendTect, consist of a SEG-Y file for each frequency.

SEG-Y capability is the most relevant feature, since is not present in any of other FOSS tool and give to researchers the possibility to visualize radargrams in advanced software, specific for seismic interpretation and analysis, making it possible to interpret the data in a fully three-dimensional environment.

This tool, available on zenodo (Nodjoumi, 2021), has been developed completely in Python 3, relying only on open-source libraries, compatible with principal operating systems and with parallel processing capabilities, granting easy scalability and usability across a wide range of computing machines. It is also highly customizable since it can be expanded adding processing steps before export or new types of output. An additional module to ingest data directly into PostgreSQL/PostGIS and a module to interact directly with ACT-REACT interface of data platforms are under development.

Acknowledgments:

This study is within the Europlanet 2024 RI, and it has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871149. 

References:

Lauro, S. E. et al. (2020) ‘Multiple subglacial water bodies below the south pole of Mars unveiled by new MARSIS data’, doi: 10.1038/s41550-020-1200-6.

Nodjoumi, G. (2021) 'MARSIS-xDR-READER', doi: 10.5281/zenodo.4436199

Orosei, R. et al. (2020) ‘The global search for liquid water on mars from orbit: Current and future perspectives’, doi: 10.3390/life10080120.

How to cite: Nodjoumi, G., Guallini, L., Orosei, R., Penasa, L., and Rossi, A. P.: New open source tools for MARSIS: providing access to SEG-Y data format for 3D analysis., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4031, https://doi.org/10.5194/egusphere-egu21-4031, 2021.

Cloud and High-Performance Computing
13:51–13:53
|
EGU21-2053
|
ECS
Pavel Golodoniuc, Januka Attanayake, Abraham Jones, and Samuel Bradley

Detecting and locating earthquakes relies on seismic events being recorded by a number of deployed seismometers. To detect earthquakes effectively and accurately, seismologists must design and install a network of seismometers that can capture small seismic events in the sub-surface.

A major challenge when deploying an array of seismometers (seismic array) is predicting the smallest earthquake that could be detected and located by that network. Varying the spacing and number of seismometers dramatically affects network sensitivity and location precision and is very important when researchers are investigating small-magnitude local earthquakes. For cost reasons, it is important to optimise network design before deploying seismometers in the field. In doing so, seismologists must accurately account for parameters such as station locations, site-specific noise levels, earthquake source parameters, seismic velocity and attenuation in the wave propagation medium, signal-to-noise ratios, and the minimum number of stations required to compute high-quality locations.

AuScope AVRE Engage Program team has worked with researchers from the seismology team at the University of Melbourne to better understand their solution for optimising seismic array design to date: an analytical method called SENSI that has been developed by Tramelli et al. (2013) to design seismic networks, including the GipNet array deployed to monitor seismicity in the Gippsland region in Victoria, Australia. The underlying physics and mechanics of the method are straightforward, and when applied sensibly, can be used as a basis for the design of seismic networks anywhere in the world. Our engineers have built an application leveraging a previously developed Geophysical Processing Toolkit (GPT) as an application platform and harnessed the scalability of a Cloud environment provided by the EASI Hub, which minimised the overall development time. The GPT application platform provided the groundwork for a web-based application interface and enabled interactive visualisations to facilitate human-computer interaction and experimentation.

How to cite: Golodoniuc, P., Attanayake, J., Jones, A., and Bradley, S.: Development of an interactive Cloud-based seismic network modelling application on a common Geophysical Processing Toolkit platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2053, https://doi.org/10.5194/egusphere-egu21-2053, 2021.

13:53–13:55
|
EGU21-12152
Guillaume Drouen, Daniel Schertzer, and Ioulia Tchiguirinskaia

As cities are put under greater pressure from the threat of impacts of climate change, in particular the risk of heavier rainfall and flooding, there is a growing need to establish a hierarchical form of resilience in which critical infrastructures can become sustainable. The main difficulty is that geophysics and urban dynamics are strongly nonlinear with an associated, extreme variability over a wide range of space-time scales.

The polarimetric X-band radar at the ENPC’s campus (East of Paris) introduced a paradigm change in the prospects of environmental monitoring in Ile-de France. The radar is operated since May 2015 and has several characteristics that makes it of central importance for the environmental monitoring of the region.

Based on the radar data and other scientific mesurement tools, the platform for greater Paris was developped in participative co-creation, and in scientific collaboration with the world leader industrial in water management. As the need for data accessibility, a fast and reliable infrastructure were major requirements from the scientific community, the platform was build as a cloud-based solution. It provides scientific weather specialists, as well as water manager,  a fast and steady platform accessible from their web browser on desktop and mobile displays.

It was developped using free and open sources librairies, it is rooted on an integrated suite of modular components based on an asynchronous event-driven JavaScript runtime environment. It includes a comprehensive and (real-time) accessible database and also provides tools to analyse historical data on different time and geographic scales around the greater Paris.

The Fresnel SaaS (Sofware as a Service) cloud-based platform is an example of nowadays IT tools to dynamically enhance urban resilience. Developments are still in progress, in constant request and feedback loops from the scientific and professional world.

How to cite: Drouen, G., Schertzer, D., and Tchiguirinskaia, I.: New cloud-based tool to Dynamically Manage Urban Resilience: the Fresnel Platform for Greater Paris, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12152, https://doi.org/10.5194/egusphere-egu21-12152, 2021.

13:55–13:57
|
EGU21-14441
Simon Jirka, Benedikt Gräler, Matthes Rieke, and Christian Autermann

For many scientific domains such as hydrology, ocean sciences, geophysics and social sciences, geospatial observations are an important source of information. Scientists conduct extensive measurement campaigns or operate comprehensive monitoring networks to collect data that helps to understand and to model current and past states of complex environment. The variety of data underpinning research stretches from in-situ observations to remote sensing data (e.g., from the European Copernicus programme) and contributes to rapidly increasing large volumes of geospatial data.

However, with the growing amount of available data, new challenges arise. Within our contribution, we will focus on two specific aspects: On the one hand, we will discuss the specific challenges which result from the large volumes of remote sensing data that have become available for answering scientific questions. For this purpose, we will share practical experiences with the use of cloud infrastructures such as the German platform CODE-DE and will discuss concepts that enable data processing close to the data stores. On the other hand, we will look into the question of interoperability in order to facilitate the integration and collaborative use of data from different sources. For this aspect, we will give special consideration to the currently emerging new generation of standards of the Open Geospatial Consortium (OGC) and will discuss how specifications such as the OGC API for Processes can help to provide flexible processing capabilities directly within Cloud-based research data infrastructures.

How to cite: Jirka, S., Gräler, B., Rieke, M., and Autermann, C.: Cloud-based Research Data Infrastructures Integrating In-Situ and Remote Sensing Data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14441, https://doi.org/10.5194/egusphere-egu21-14441, 2021.

13:57–13:59
|
EGU21-15302
José Manuel Delgado Blasco, Antonio Romeo, David Heyns, Natassa Antoniou, and Rob Carrillo

The OCRE project, an H2020 funded by the European Commission, aims to increase the usage of Cloud and EO services by the European research community by putting available EC funds 9.5M euro, aiming to removing the barriers regarding the service discovery and providing services free-at-the-point-of-the-user.

OCRE started to grant EU research projects for using OCRE’s procured Cloud commodity and EO services through respective open calls in 2019-2020. Additionally, in 2021 additional open calls are foreseen also for projects willing to receive funds for using EO services procured by the OCRE project. Also, a permanent open call for individual researchers is foreseen.

During 2020, OCRE also funded, through another open call, EU projects dealing with research related to COVID-19 and they were the first projects that started the usage of the available commodity services. Additionally, in 2020, the OCRE project closed and awarded EU service providers for the provision of cloud and commodity services and, in early 2021, the Dynamic Purchasing System (DPS) for the procurement of EO services will be opened.

Additionally, during 2020 an External Advisory Board (EAB) was created to assist OCRE in the project awarding process. The EAB is formed by recognized experts from different domains providing OCRE with the balanced knowledge needed to ensure transparency and equality in such an important process.

This presentation will provide an overview of the possibilities offered by OCRE to researchers interested in boosting their activities using commercial cloud services.

How to cite: Delgado Blasco, J. M., Romeo, A., Heyns, D., Antoniou, N., and Carrillo, R.: OCRE: started the funding opportunities for the European research community for using OCRE’s procured Cloud and Earth Observation commercial services., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15302, https://doi.org/10.5194/egusphere-egu21-15302, 2021.

13:59–14:01
|
EGU21-2442
|
ECS
Marco Kulüke, Fabian Wachsmann, Georg Leander Siemund, Hannes Thiemann, and Stephan Kindermann

This study provides a guidance to data providers on how to transfer existing NetCDF data from a hierarchical storage system into Zarr to an object storage system.

In recent years, object storage systems became an alternative to traditional hierarchical file systems, because they are easily scalable and offer faster data retrieval, as compared to hierarchical storage systems.

Earth system sciences, and climate science in particular, handle large amounts of data. These data usually are represented as multi-dimensional arrays and traditionally stored in netCDF format on hierarchical file systems. However, the current netCDF-4 format is not yet optimized for object storage systems. NetCDF data transfers from an object storage can only be conducted on file level which results in heavy download volumes. An improvement to mitigate this problem can be the Zarr format, which reduces data transfers, due to the direct chunk and meta data access and hence increases the input/output operation speed in parallel computing environments.

As one of the largest climate data providers worldwide, the German Climate Computing Center (DKRZ) continuously works towards efficient ways to make data accessible for the user. This use case shows the conversion and the transfer of a subset of the Coupled Model Intercomparison Project Phase 6 (CMIP6) climate data archive from netCDF on the hierarchical file system into Zarr to the OpenStack object store, known as Swift, by using the Zarr Python package. Conclusively, this study will evaluate to what extent Zarr formatted climate data on an object storage system is a meaningful addition to the existing high performance computing environment of the DKRZ.

How to cite: Kulüke, M., Wachsmann, F., Siemund, G. L., Thiemann, H., and Kindermann, S.: Transfer Data from NetCDF on Hierarchical Storage to Zarr on Object Storage: CMIP6 Climate Data Use Case, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2442, https://doi.org/10.5194/egusphere-egu21-2442, 2021.

14:01–14:03
|
EGU21-3205
|
ECS
Alessandro Spinuso, Friedrich Striewski, Ian van der Neut, Mats Veldhuizen, Tor Langeland, Christian Page, and Daniele Bailo

Modern interactive tools for data analysis and visualisation are designed to expose their functionalities as a service through the web. We present an open source web API (SWIRRL) that allows Science Gateways to easily integrate such tools in their websites and re-purpose them to their users. The API, developed in the context of the ENVRIFair and IS-ENES3 EU projects, deals on behalf of the clients with the underlying complexity of allocating and managing resources within a target container orchestration platform on the cloud. By combining storage and third parties' tools, such as JupyterLab and the Enlighten visualisation software, the API creates dedicated working sessions on-demand. Thanks to the API’s staging workflows, SWIRRL sessions can be populated with data of interest collected from external data providers. The system is designed to offer customisation and reproducibility thanks to the recording of provenance, which is performed for each method of the API’s affecting the session. This is implemented by combining a PROV-Templates catalogue and a graph database, which are deployed as independent microservices. Notebooks can be customised with new or updated libraries, and the provenance of such changes is then exposed to users via the SWIRRL interactive JupyterLab extension. Here, users can control different types of reproducibility actions. For instance, they can restore the libraries and data used within the notebook in the past, as well as creating snapshots of the running environment. This allows users to share and rebuild full Jupyter workspaces, including raw data and user generated methods. Snapshots are stored to Git as Binder repositories, thereby compatible with  mybinder.org. Finally, we will discuss how SWIRRL is and will be adopted by existing portals for Climate analysis (Climate4Impact) and for Solid Earth Science (EPOS), where advanced data discovery capabilities are combined with customisable, recoverable and reproducible workspaces.

How to cite: Spinuso, A., Striewski, F., van der Neut, I., Veldhuizen, M., Langeland, T., Page, C., and Bailo, D.: SWIRRL API for provenance-aware and reproducible workspaces. The EPOS and IS-ENES approach., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3205, https://doi.org/10.5194/egusphere-egu21-3205, 2021.

14:03–14:05
|
EGU21-5489
Dirk Barbi, Miguel Andrés-Martínez, Deniz Ural, Luisa Cristini, Paul Gierz, and Nadine Wieters

During the last two decades, modern societies have gradually understood the urge to tackle the climate change challenge, and consequently, a growing number of national and international initiatives have been launched with the aim of better understanding the Earth System. In this context, Earth System Modelling (ESM) has rapidly expanded, leading to a large number of research groups targeting the many components of the system at different scales and with different levels of interactions between components. This has led to the development of increasing number of models, couplings, versions tuned to address different scales or scenarios, and model-specific compilation and operating procedures. This operational complexity makes the implementation of multiple models excessively time consuming especially for less experienced modellers.

ESM-Tools is an open-source modular software written in Python, aimed to overcome many of the difficulties associated to the operation of ESMs. ESM-Tools allows for downloading, compiling and running a wide range of ESM models and coupled setups in the most important HPC facilities available in Germany. It currently supports multiple models for ocean, atmosphere, biochemistry, ice sheet, isostatic adjustment, hydrology, and land-surface, and six ocean-atmosphere and two ice-sheet-ocean-atmosphere coupled setups, through two couplers (included modularly through ESM-Interface). The tools are coded in Python while all the component and coupling information is contained in easy-to-read YAML files. The front-end user is required to provide only a short script written in YAML format, containing the experiment specific definitions. This user-friendly interface makes ESM-Tools a convenient software for training and educational purposes. Simultaneously, its modularity and the separation between the component-specific information and tool scripts facilitates the implementation and maintenance of new components, couplings and versions. ESM-Tools team of scientific programmers provides also user support, workshops and detailed documentation. The ESM-Tools were developed within the framework of the project Advance Earth System Model Capacity, supported by Helmholtz Association and has become one of the main pillars of the German infrastructure for Climate Modelling.

How to cite: Barbi, D., Andrés-Martínez, M., Ural, D., Cristini, L., Gierz, P., and Wieters, N.: ESM-Tools Version 5.0: A modular infrastructure for stand-alone and coupled Earth System Modelling (ESM), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5489, https://doi.org/10.5194/egusphere-egu21-5489, 2021.

14:05–14:07
|
EGU21-10831
|
ECS
Jaro Hokkanen, Stefan Kollet, Jiri Kraus, Andreas Herten, Markus Hrywniak, and Dirk Pleiter

Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in high-performance computing. Implementations that simultaneously result in a good performance and developer productivity while keeping the codebase adaptable and well maintainable in the long-term are of high importance. ParFlow, a widely used hydrologic model, achieves these attributes by hiding the architecture-dependent code in preprocessor macros (ParFlow embedded Domain Specific Language, eDSL) and leveraging NVIDIA's Unified Memory technology for memory management. The implementation results in very good weak scaling with up to 26x speedup when using four NVIDIA A100 GPUs per node compared to using the available 48 CPU cores. Good weak scaling is observed using hundreds of nodes on the new JUWELS Booster system at the Jülich Supercomputing Centre, Germany. Furthermore, it is possible to couple ParFlow with other earth system compartment models such as land surface and atmospheric models using the OASIS-MCT coupler library, which handles the data exchange between the different models. The ParFlow GPU implementation is fully compatible with the coupled implementation with little changes to the source code. Moreover, coupled simulations offer interesting load-balancing opportunities for optimal usage of the existing resources. For example, running ParFlow on GPU nodes, and another application component on CPU-only nodes, or efficiently distributing the CPU and GPU resources of a single node between the different application components may result in the best usage of heterogeneous architectures.

How to cite: Hokkanen, J., Kollet, S., Kraus, J., Herten, A., Hrywniak, M., and Pleiter, D.: Coupled earth system modeling on heterogeneous HPC architectures with ParFlow in the Terrestrial Systems Modeling Platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10831, https://doi.org/10.5194/egusphere-egu21-10831, 2021.

14:07–14:09
|
EGU21-12209
Stefan Versick, Thomas Fischer, Ole Kirner, Tobias Meisel, and Jörg Meyer

Earth System Models (ESM) got much more demanding over the last years. Modelled processes got more complex and more and more processes are considered in models. In addition resolutions of the models got higher to improve accuracy of predictions. This requires faster high performance computers (HPC) and better I/O performance. One way to improve I/O performance is to use faster file systems. Last year we showed the impact of the ad-hoc file system on the performance of the ESM EMAC. An ad-hoc file system is a private parallel file system which is created on-demand for an HPC job using the node-local storage devices, in our case solid-state-disks (SSD). It only exists during the runtime of the job. Therefore output data have to be moved to a permanent file system before the job has finished. Performance improvements are due to the use of SSDs in case of small chunks of I/O or a high amount of I/O operations per second. Another reason for a performace boost is because the running job can exclusively access the file system. To get a better overview in which cases ESMs benefit from using ad-hoc file systems we repeated our performance tests with further ESMs with different I/O strategies. In total we now analyzed EMAC (parallel netcdf), ICON2.5 (netcdf with asynchronous I/O), ICON2.6 (netcdf with Climate Data Interface (CDI) library) and OpenGeoSys (parallel VTU).

How to cite: Versick, S., Fischer, T., Kirner, O., Meisel, T., and Meyer, J.: Accelerating I/O in ESMs using on demand filesystems, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12209, https://doi.org/10.5194/egusphere-egu21-12209, 2021.

14:09–14:11
|
EGU21-14517
Christian Pagé, Maarten Plieger, Wim Som de Cerff, Alessandro Spinuso, Rosa Filgueira, Malcolm Atkinson, Chrysoula Themeli, Iraklis Klampanos, and Vangelis Karkaletsis

Climate impact and adaptation measures are becoming urgent to be put in place and anticipated. During the past years, climate change effects have been producing adverse conditions in many parts of the world, with significant societal and financial impacts. Advanced analysis tools are needed to process ensembles of simulations of the future climate, in order to generate useful and tailored products for end users.

An example of a complex analysis tool used in climate research and adaptation studies is a tool to follow storm tracks. In the context of climate change, it is important to know how storm tracks will change in the future, in both their frequency and intensity. Storms can cause significant societal impacts, hence it is important to assess future patterns. Having access to this type of complex analysis tool is very useful, and integrating them with front-ends like the IS-ENES climate4impact (C4I) can enable the use of those tools by a larger number of researchers and end users.

Integrating this type of complex tool is not an easy task. It requires significant development effort, especially if one of the objectives is also to adhere to FAIR principles. The DARE Platform enables research developers to faster develop the implementations of scientific workflows more rapidly. This work presents how such a complex analysis tool has been implemented to be easily integrated with the C4I platform. The DARE Platform also provides easy access to e-infrastructure services like EUDAT B2DROP, to store intermediate or final results and powerful provenance-powered tools to help researchers manage their work and data.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements N°824084 and N°777413.

How to cite: Pagé, C., Plieger, M., Som de Cerff, W., Spinuso, A., Filgueira, R., Atkinson, M., Themeli, C., Klampanos, I., and Karkaletsis, V.: Making Cyclone Tracking accessible to end users for Climate Research and Applications, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14517, https://doi.org/10.5194/egusphere-egu21-14517, 2021.

14:11–14:13
|
EGU21-16185
Sebastian M. Ernst

The Free and Open Source Software (FOSS) ecosystem around Geographic Information System (GIS) is currently seeing rapid growth – similar to FOSS ecosystems in other scientific disciplines. At the same time, the need of broad programming and software development skills appears to become a common theme for potential (scientific) users. There is a rather clear boundary between what can be done with Graphical User Interface applications such as QGIS only on the one hand side and contemporary software libraries on the other hand side – if one actually has the required skillet to use the latter. Practical experience shows that more and more types of research require far more than just rudimentary software development skills. Those can be hard to acquire and distract from the actual scientific work at hand. For instance the installation, integration and deployment of much desired software libraries from the field of high-performance computing (HPC) for e.g. general-purpose computing on graphics processing units (GPGPU) or computations on clusters or cloud resources is very often becoming an obstacle on its own. Recent advances in packaging and deployment systems around popular programming language ecosystems such as Python enable a new kind of thinking, however. Desktop GUI applications can now much more easily be combined with the mentioned type of libraries, which lowers the entry barrier to HPC applications and the handling of large quantities of data drastically. This work aims at providing an overview of the state of the art in this field and showcasing possible techniques.

How to cite: Ernst, S. M.: On combining GUI desktop GIS with computer clusters & cloud resources, the role of programming skills and the state of the art in GUI driven GIS HPC applications, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16185, https://doi.org/10.5194/egusphere-egu21-16185, 2021.

14:13–14:15
|
EGU21-4895
|
ECS
Stef Smeets, Jaro Camphuijsen, Niels Drost, Fakhereh Alidoost, Bouwe Andela, Berend Weel, Peter Kalverla, Ronald van Haren, Klaus Zimmermann, Jerom Aerts, and Rolf Hut

With the release of the ERA5 dataset, worldwide high-resolution reanalysis data became available with open access for public use. The Copernicus CDS (Climate Data Store) offers two options for accessing the data: a web interface and a Python API. Consequently, automated downloading of the data requires advanced knowledge of Python and a lot of work. To make this process easier, we developed era5cli

The command line interface tool era5cli enables automated downloading of ERA5 using a single command. All variables and options available in the CDS web form are now available for download in an efficient way. Both the monthly and hourly dataset are supported. Besides automation, era5cli adds several useful functionalities to the download pipeline.

One of the key options in era5cli is to spread one download command over multiple CDS requests, resulting in higher download speeds. Files can be saved in both GRIB and NETCDF format with automatic, yet customizable file names. The info command lists correct names of the available variables and pressure levels for 3D variables. For debugging purposes and testing the dryrun option can be selected to return only the CDS request. An overview of all available options, including instructions on how to configure your CDS account, is available in our documentation. Recent developments include support for ERA5 back extension and ERA5-Land. The source code for era5cli is available on https://github.com/eWaterCycle/era5cli.

How to cite: Smeets, S., Camphuijsen, J., Drost, N., Alidoost, F., Andela, B., Weel, B., Kalverla, P., van Haren, R., Zimmermann, K., Aerts, J., and Hut, R.: era5cli: the command line interface to ERA5 data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4895, https://doi.org/10.5194/egusphere-egu21-4895, 2021.

14:15–14:17
|
EGU21-16484
Sébastien Denvil, Manuel Fuentes, Matthew Manoussakis, Sebastien Villaume, Tiago Quintino, Simon Smart, and Baudouin Raoult
CMWF is the European Centre for Medium-Range Weather Forecasts. We are both a research institute and a 24/7 operational service, producing global numerical weather predictions and other data for our Member and Co-operating States and the broader community. The Centre has one of the largest supercomputer facilities and meteorological data archives in the world.
 
ECMWF is about to migrate his 400+ PB of data to his new data centre in Bologna while continuing its operations. We will present and discuss challenges and opportunities that this migration offers in terms of evolution of operation practices.
The planning, the evolution, and the transition periods of the ECMWF Data Handling System migration to Bologna will be presented.
 
The migration must occur while preserving ECMWF’s product generation and archive services, ensuring appropriate levels of quality of service. The planning and testing of a continuity plan of operations for operational forecasts, member states time critical suites, Copernicus suites (ERA5, CAMS C3S seasonal and alike), and research suites will be presented. This continuity plan of operation relies on the full identification and traceability of the data flow involves during critical operations. Indeed, it is not economically viable to keep the 400 PB online during all the migration period.
 
A completely redesigned data services deployment and testing mechanism will be use in the Bologna Data Center. Automation will be paramount in this context as the need is to redeploy entirely and from scratch all our services. This journey will be presented, and challenges inherent to software defined infrastructure and services will be discussed.

How to cite: Denvil, S., Fuentes, M., Manoussakis, M., Villaume, S., Quintino, T., Smart, S., and Raoult, B.: ECMWF's data archive and dissemination services migration to the Bologna Data Center. , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16484, https://doi.org/10.5194/egusphere-egu21-16484, 2021.

14:17–15:00