This session aims to highlight Earth Science research concerned with state of the art computational and data infrastructures such as HPC (Supercomputer, Cluster, accelerator-based systems GPGU, FPGA), Clouds and accelerator-based systems (GPGPU, FPGA).
We will focus on data intensive workflows (scientific workflows) between Infrastructures e.g. European data and compute infrastructures down to complex analysis workflows on an HPC system e.g. in situ coupling frameworks.

The session presents an opportunity for everyone to present and learn from results achieved, success stories and experience gathered during the process of study, adaptation and exploitation of these systems.
Further contributions are welcome that showcase middleware and tools developed to support Earth Science applications on HPC systems and Cloud infrastructures, e.g. to increase effectivity, robustness or ease of use.

Topics of interest include:
- Data intensive Earth Science applications and how they have been adapted to different HPC infrastructures
- Data mining software stacks in use for large environmental data
- HPC simulation and High Performance Data Analytics e.g. code coupling, in-situ workflows
- Experience with Earth Science applications in Cloud environments e.g. solutions on Amazon EC2, Microsoft Azure, and Earth Science simulation codes in private and European Cloud infrastructures (Open Science Cloud)
- Tools and services for Earth Science data management, workflow execution, web services and portals to ease access to compute resources.
- Tools and middleware for Earth Science applications on Grid, Cloud and on High Performance Computing infrastructures.

Convener: Horst Schwichtenberg | Co-convener: Wim Som de Cerff
| Attendance Thu, 07 May, 16:15–18:00 (CEST)

Files for download

Session summary Download all presentations (100MB)

Chat time: Thursday, 7 May 2020, 16:15–18:00

D791 |
| Highlight
Alexey Gokhberg, Laura Ermert, Jonas Igel, and Andreas Fichtner

The study of ambient seismic noise sources and their time- and space-dependent distribution is becoming a crucial component of the real-time monitoring of various geosystems, including active fault zones and volcanoes, as well as geothermal and hydrocarbon reservoirs. In this context, we have previously implemented a combined cloud - HPC infrastructure for production of ambient source maps with high temporal resolution. It covers the entire European continent and the North Atlantic, and is based on seismic data provided by the ORFEUS infrastructure. The solution is based on the Application-as-a-Service concept and includes (1) acquisition of data from distributed ORFEUS data archives, (2) noise source mapping, (3) workflow management, and (4) front-end Web interface to end users.

We present the new results of this ongoing project conducted with support of the Swiss National Supercomputing Centre (CSCS). Our recent goal has been transitioning from mapping the seismic noise sources towards modeling them based on our new method for near real-time finite-frequency ambient seismic noise source inversion. To invert for the power spectral density of the noise source distribution of the secondary microseisms we efficiently forward model global cross-correlation wavefields for any noise distribution. Subsequently, a gradient-based iterative inversion method employing finite-frequency sensitivity kernels is implemented to reduce the misfit between synthetic and observed cross correlations.

During this research we encountered substantial challenges related to the large data volumes and high computational complexity of involved algorithms. We handle these problems by using the CSCS massively parallel heterogeneous supercomputer "Piz Daint". We also apply various specialized numeric techniques which include: (1) using precomputed Green's functions databases generated offline with Axisem and efficiently extracted with Instaseis package and (2) our previously developed high performance package for massive cross correlation of seismograms using GPU accelerators. Furthermore, due to the inherent restrictions of supercomputers, some crucial components of the processing pipeline including the data acquisition and workflow management are deployed on the OpenStack cloud environment. The resulting solution combines the specific advantages of the supercomputer and cloud platforms thus providing a viable distributed platform for the large-scale modeling of seismic noise sources.

How to cite: Gokhberg, A., Ermert, L., Igel, J., and Fichtner, A.: Heterogeneous cloud-supercomputing framework for daily seismic noise source inversion, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11280, https://doi.org/10.5194/egusphere-egu2020-11280, 2020

D792 |
Mattia Santoro, Paolo Mazzetti, Nicholas Spadaro, and Stefano Nativi

The VLab (Virtual Laboratory), developed in the context of the European projects ECOPOTENTIAL and ERA-PLANET, is a cloud-based platform to support the activity of environmental scientists in sharing their models. The main challenges addressed by VLab are: (i) minimization of interoperability requirements in the process of model porting (i.e. to simplify as much as possible the process of publishing and sharing a model for model developers) and (ii) support multiple programming languages and environments (it must be possible porting models developed in different programming languages and which use an arbitrary set of libraries).

In this presentation we describe how VLab supports a multi-cloud deployment approach and the benefits.

In this presentation we describe VLab architecture and, in particular, how this enables supporting a multi-cloud deployment approach.

Deploying VLab on different cloud environments allows model execution where it is most convenient, e.g. depending on the availability of required data (move code to data).

This was implemented in the web application for Protected Areas, developed by the Joint Research Centre of the European Commission (EC JRC) in the context of the EuroGEOSS Sprint to Ministerial activity and demonstrated at the last GEO-XVI Plenary meeting in Canberra. The web application demonstrates the use of Copernicus Sentinel data to calculate Land Cover and Land Cover change in a set of Protected Areas belonging to different ecosystems. Based on user’s selection of satellite products to use, the different available cloud platforms where to run the model are presented along with their data availability for the selected products. After the platform selection, the web application utilizes the VLab APIs to launch the EODESM (Earth Observation Data for Ecosystem Monitoring) model (Lucas and Mitchell, 2017), monitoring the execution status and retrieve the output.

Currently, VLab was experimented with the following cloud platforms: Amazon Web Services, three of the 4+1 the Coperncius DIAS platforms (namely: ONDA, Creodias and Sobloo) and the European Open Science Cloud (EOSC).

Another possible scenario empowered by this multi-platform deployment feature is the possibility to let the user choose the computational platform and utilize her/his credentials to request the needed computational resources. Finally, it is also possible to exploit this feature for benchmarking different cloud platforms with respect to their performances.



Lucas, R. and A. Mitchell (2017). "Integrated Land Cover and Change Classifications"The Roles of Remote Sensing in Nature Conservation, pp. 295–308.


How to cite: Santoro, M., Mazzetti, P., Spadaro, N., and Nativi, S.: Supporting Multi-cloud Model Execution with VLab, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13518, https://doi.org/10.5194/egusphere-egu2020-13518, 2020

D793 |
Jaro Hokkanen, Jiri Kraus, Andreas Herten, Dirk Pleiter, and Stefan Kollet

  ParFlow is known as a numerical model that simulates the hydrologic cycle from the bedrock to the top of the plant canopy. The original codebase provides an embedded Domain-Specific Language (eDSL) for generic numerical implementations with support for supercomputer environments (distributed memory parallelism), on top of which the hydrologic numerical core has been built.
  In ParFlow, the newly developed optional GPU acceleration is built directly into the eDSL headers such that, ideally, parallelizing all loops in a single source file requires only a new header file. This is possible because the eDSL API is used for looping, allocating memory, and accessing data structures. The decision to embed GPU acceleration directly into the eDSL layer resulted in a highly productive and minimally invasive implementation.
  This eDSL implementation is based on C host language and the support for GPU acceleration is based on CUDA C++. CUDA C++ has been under intense development during the past years, and features such as Unified Memory and host-device lambdas were extensively leveraged in the ParFlow implementation in order to maximize productivity. Efficient intra- and inter-node data transfer between GPUs rests on a CUDA-aware MPI library and application side GPU-based data packing routines.
  The current, moderately optimized ParFlow GPU version runs a representative model up to 20 times faster on a node with 2 Intel Skylake processors and 4 NVIDIA V100 GPUs compared to the original version of ParFlow, where the GPUs are not used. The eDSL approach and ParFlow GPU implementation may serve as a blueprint to tackle the challenges of heterogeneous HPC hardware architectures on the path to exascale.

How to cite: Hokkanen, J., Kraus, J., Herten, A., Pleiter, D., and Kollet, S.: Accelerated hydrologic modeling: ParFlow GPU implementation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12904, https://doi.org/10.5194/egusphere-egu2020-12904, 2020

D794 |
| Highlight
Arne de Wall, Albert Remke, Thore Fechner, Jan van Zadelhoff, Andreas Müterthies, Sönke Müller, Adrian Klink, Dirk Hinterlang, Matthias Herkt, and Christoph Rath

The Competence Center Remote Sensing of the State Agency for Nature, Environment and Consumer Protection North Rhine-Westphalia (LANUV NRW, Germany) uses data from the Earth observation infrastructure Copernicus to support nature conservation tasks. Large amounts of data and computationally intensive processing chains (ingestion, pre-processing, analysis, dissemination) as well as satellite and in-situ data from many different sources have to be processed to produce statewide information products. Other state agencies and larger local authorities of NRW have similar requirements. Therefore, the state computing center (IT.NRW) has started to develop a Copernicus Data Infrastructure in NRW in cooperation with LANUV, other state authorities and partners from research and industry to meet their various needs. 

The talk presents the results of a pilot project in which the architecture of a Copernicus infrastructure node for the common Spatial Data Infrastructure of the state was developed. It is largely based on cloud technologies (i.a. Docker, Kubernetes). The implementation of the architectural concept comprised as a use case of an effective data analysis procedure to monitor orchards in North Rhine-Westphalia. In addition to Sentinel 1 and Sentinel 2 data, the new Copernicus Data Infrastructure processes digital terrain models, digital surface models and LIDAR-based data products. Finally we will discuss the experience gained, lessons learned, and conclusions for further developments of the Copernicus Data Infrastructure in North-Rhine Westphalia.

How to cite: de Wall, A., Remke, A., Fechner, T., van Zadelhoff, J., Müterthies, A., Müller, S., Klink, A., Hinterlang, D., Herkt, M., and Rath, C.: Copernicus Data Infrastructure NRW, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17296, https://doi.org/10.5194/egusphere-egu2020-17296, 2020

D795 |
Kor de Jong, Derek Karssenberg, Deb Panja, and Marc van Kreveld

Computer models are built with a specific purpose (or scope) and runtime platform in mind. The purpose of an earth science simulation model might be to be able to predict the spatio-temporal distribution of fresh water resources at a continental scale, and the runtime platform might be a single CPU core in a single desktop computer running one of the popular operating systems. Model size and complexity tend to increase over time, for example due to the availability of more detailed input data. At some point, such models need to be ported to more powerful runtime platforms, containing more cores or nodes that can be used in parallel. This complicates the model code and requires additional skills of the model developer.

Designing models requires the knowledge of domain experts, while developing models requires software engineering skills. By providing facilities for representing state variables and a set of generic modelling algorithms, a modelling framework makes it possible for domain experts without a background in software engineering to create and maintain models. An example of such a modelling framework is PCRaster [3], and examples of models created with it are the PCRGLOB-WB global hydrological and water resources model [2], and the PLUC high resolution continental scale land use change model [4].

Models built using a modelling framework are portable to all runtime platforms on which the framework is available. Ideally, this includes all popular runtime platforms, ranging from shared memory laptops and desktop computers to clusters of distributed memory nodes. In this work we look at an approach for designing a mod elling framework for the development of earth science models using asynchronous many-tasks (AMT). AMT is a programming model that can be used to write software in terms of relatively small tasks, with dependencies between them. During the execution of the tasks, new tasks can be added to the set. An advantage of this approach is that it allows for a clear separation of concerns between the model code and the code for executing work. This allows models to be expressed using traditional procedural code, while the work is performed asynchronously, possibly in parallel and distributed.

We designed and implemented a distributed array data structure and an initial set of modelling algorithms, on top of an implementation of the AMT programming model, called HPX [1]. HPX provides a single C++ API for defining asynchronous tasks and their dependencies, that execute locally or on remote nodes. We performed experiments to gain insights in the scalability of the individual algorithms and simple models in which these algorithms are combined.

In the presentation we will explain key aspects of the AMT programming model, as implemented in HPX, how we used the programming model in our framework, and the results of our scalability experiments of models built with the framework.

[1] HPX V1.3.0. http://doi.acm.org/10.1145/2676870.2676883, 5 2019.

[2] E. H. Sutanudjaja et al. PCR-GLOBWB 2: a 5 arc-minute global hydrological and water resources model. Geoscientific Model Development Discussions, pages 1–41, dec 2017.

[3] The PCRaster environmental modelling framework. https://www.pcraster.eu

[4] PLUC model. https://github.com/JudithVerstegen/PLUC_Mozambique

How to cite: de Jong, K., Karssenberg, D., Panja, D., and van Kreveld, M.: Towards a scalable framework for earth science simulation models, using asynchronous many-tasks, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18749, https://doi.org/10.5194/egusphere-egu2020-18749, 2020

D796 |
| Highlight
Sebastian Friedemann, Bruno Raffin, Basile Hector, and Jean-Martial Cohard

In situ and in transit computing is an effective way to place postprocessing and preprocessing tasks for large scale simulations on the high performance computing platform. The resulting proximity between the execution of preprocessing, simulation and postprocessing permits to lower I/O by bypassing slow and energy inefficient persistent storages. This permits to scale workflows consisting of heterogeneous components such as simulation, data analysis and visualization, to modern massively parallel high performance platforms. Reordering the workflow components gives a manifold of new advanced data processing possibilities for research. Thus in situ and in transit computing are vital for advances in the domain of geoscientific simulation which relies on the increasing amount of sensor and simulation data available.

In this talk, different in situ and in transit workflows, especially those that are useful in the field of geoscientific simulation, are discussed. Furthermore our experiences augmenting ParFlow-CLM, a physically based, state-of-the-art, fully coupled water transfer model for the critical zone, with FlowVR, an in situ framework with a strict component paradigm, are presented.
This allows shadowed in situ file writing, in situ online steering and in situ visualization.

In situ frameworks further can be coupled to data assimilation tools.
In the on going EoCoE-II we propose to embed data assimilation codes into an in transit computing environment. This is expected to enable ensemble based data assimilation on continental scale hydrological simulations with multiple thousands of ensemble members.

How to cite: Friedemann, S., Raffin, B., Hector, B., and Cohard, J.-M.: In Situ and In Transit Computing for Large Scale Geoscientific Simulation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21390, https://doi.org/10.5194/egusphere-egu2020-21390, 2020

D797 |
Ezequiel Cimadevilla Alvarez, Aida Palacio Hoz, Antonio S. Cofiño, and Alvaro Lopez Garcia

Data analysis in climate science has been traditionally performed in two different environments, local workstations and HPC infrastructures. Local workstations provide a non scalable environment in which data analysis is restricted to small datasets that are previously downloaded. On the other hand, HPC infrastructures provide high computation capabilities by making use of parallel file systems and libraries that allow to scale data analysis. Due to the great increase in the size of the datasets and the need to provide computation environments close to data storage, data providers are evaluating the use of commercial clouds as an alternative for data storage. Examples of commercial clouds are Google Cloud Storage and Amazon S3, although cloud storage is not restricted to commercial clouds since several institutions provide private or hybrid clouds. These providers use systems known as “object storage” in order to provide cloud storage, since they offer great scalability and storage capacity compared to POSIX file systems found in local or HPC infrastructures.

Cloud storage systems, based on object storage, are incompatible with existing libraries and data formats used by climate community to store and analyse data. Legacy libraries and data formats include netCDF and HDF5, which assume the underlying storage is a file system and it’s not an object store. However, new libraries such as Zarr try to solve the problem of storing multidimensional arrays both in file systems and object stores.

In this work we present a private cloud infrastructure built upon OpenStack which provides both file system and object storage. The infrastructure also provides an environment, based on JupyterHub, to perform  remote data analysis, close to the data. This has some advantages from users perspective. First, users are no required to deploy the required software and tools for the analysis. Second, it provides a remote environment where users can perform scalable data analytics. And third, there is no constraint to download huge amounts of data, to users local computer, before running the analysis of the data.

How to cite: Cimadevilla Alvarez, E., Palacio Hoz, A., Cofiño, A. S., and Lopez Garcia, A.: Filesystem and object storage for climate data analytics in private clouds with OpenStack, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19280, https://doi.org/10.5194/egusphere-egu2020-19280, 2020

D798 |
Pablo Gamazo, Lucas Bessone, Julián Ramos, Elena Alvareda, and Pablo Ezzatti

Reactive Transport modelling (RTM) involves the resolution of the partial differential equation that governs the transport of multiple chemical components, and several algebraic equations that account for chemical interactions. Since RTM can be very computational demanding, especially when considering long term and/or large scale scenarios, several effort have been made on the last decade in order to parallelize it. Most works have focused on implementing domain decomposition technics for distributed memory architectures, and also some effort have been made for shared memory architectures. Despite the recent advances on GPU only few works explore this architecture for RTM, and they mainly focused on the implementation of parallel sparse matrix solvers for the component transport. Solving the component transport consumes an important amount of time during simulation, but another time consuming part of RTM is the chemical speciation, a process that has to be performed multiple times during the resolution of each time step over all nodes (or discrete elements of the mesh). Since speciation involves local calculations, it is a priory a very attractive process to parallelize. But, to the author’s knowledge, no work on literature explores chemical speciation parallelization on GPU. One of the reasons behind this might be the fact that the unknowns and the number of chemical equations that act over each node might be different and can dynamically change in time. This can be a drawback for the single instruction multiple data paradigm since it might lead to the resolution of several systems with potentially different sizes all over the domain. In this work we use a general formulation that allows to solve efficiently chemical specialization on GPU. This formulation allows to consider different primary species for each node of the mesh and allows the precipitation of new mineral species and their complete dissolution keeping constant the number of components.

How to cite: Gamazo, P., Bessone, L., Ramos, J., Alvareda, E., and Ezzatti, P.: Chemical speciation in GPU for the parallel resolution of reactive transport problems., EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1631, https://doi.org/10.5194/egusphere-egu2020-1631, 2019

D799 |
Lucas Bessone, Pablo Gamazo, Julián Ramos, and Mario Storti

GPU architectures are characterized by the abundant computing capacity in relation to memory bandwich. This makes them very good for solving problems temporaly explicit and with compact spatial discretizations. Most works using GPU focuses on the parallelization of solvers of linear equations generated by the numerical methods. However, to obtain a good performance in numerical applications using GPU it is crucial to work preferably in codes based entirely on GPU. In this work we solve a 3D nonlinear diffusion equation, using finite volume method in cartesian meshes. Two different time schemes are compared, explicit and implicit, considering for the latter, the Newton method and Conjugate Gradient solver for the system of equations. An evaluation is performed in CPU and GPU of each scheme using different metrics to measure performance, accuracy, calculation speed and mesh size. To evaluate the convergence propierties of the different schemes in relation to spatial and temporal discretization, an arbitrary analytical solution is proposed, which satisfies the differential equation by chossing a source term chosen based on it.

How to cite: Bessone, L., Gamazo, P., Ramos, J., and Storti, M.: Performance Evaluation of different time schemes for a Nonlinear diffusion equation on multi-core and many core platforms, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1632, https://doi.org/10.5194/egusphere-egu2020-1632, 2019

D800 |
Merret Buurman, Sebastian Mieruch, Alexander Barth, Charles Troupin, Peter Thijsse, Themis Zamani, and Naranyan Krishnan

Like most areas of research, the marine sciences are undergoing an increased use of observational data from a multitude of sensors. As it is cumbersome to download, combine and process the increasing volume of data on the individual researcher's desktop computer, many areas of research turn to web- and cloud-based platforms. In the scope of the SeaDataCloud project, such a platform is being developed together with the EUDAT consortium.

The SeaDataCloud Virtual Research Environment (VRE) is designed to give researchers access to popular processing and visualization tools and to commonly used marine datasets of the SeaDataNet community. Some key aspects such as user authentication, hosting input and output data, are based on EUDAT services, with the perspective of integration into EOSC at a later stage.

The technical infrastructure is provided by five large EUDAT computing centres across Europe, where operational environments are heterogeneous and spatially far apart. The processing tools (pre-existing as desktop versions) are developed by various institutions of the SeaDataNet community. While some of the services interact with users via command line and can comfortably be exposed as JupyterNotebooks, many of them are very visual (e.g. user interaction with a map) and rely heavily on graphical user interfaces.

In this presentation, we will address some of the issues we encountered while building an integrated service out of the individual applications, and present our approaches to deal with them.

Heterogeneity in operational environments and dependencies is easily overcome by using Docker containers. Leveraging processing resources all across Europe is the most challenging part as yet. Containers are easily deployed anywhere in Europe, but the heavy dependence on (potentially shared) input data, and the possibility that the same data may be used by various services at the same time or in quick succession means that data synchronization across Europe has to take place at some point of the process. Designing a synchronization mechanism that does this without conflicts or inconsistencies, or coming up with a distribution scheme that minimizes the synchronization problem is not trivial.

Further issues came up during the adaptation of existing applications for server-based operation. This includes topics such as containerization, user authentication and authorization and other security measures, but also the locking of files, permissions on shared file systems and exploitation of increased hardware resources.

How to cite: Buurman, M., Mieruch, S., Barth, A., Troupin, C., Thijsse, P., Zamani, T., and Krishnan, N.: A Web-Based Virtual Research Environment for Marine Data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17708, https://doi.org/10.5194/egusphere-egu2020-17708, 2020

D801 |
Anton Frank and Jens Weismüller

ICT technologies play an increasing role in almost every aspect of the environmental sciences. The adaption of the new technologies, however, consumes an increasing amount of scientist's time, which they could better spend on their actual research. Not adapting new technologies will lead to biased research, since many researchers are not familiar with the possibilities and methods available through modern technology. This dilemma can only be resolved by close collaboration and scientific partnership between researchers and IT experts. In contrast to traditional IT service provision, IT experts have to understand the scientific problems and methods of the scientists in order to help them to select and adapt suitable services. Furthermore, a sound partnership helps towards good scientific practice, since the IT experts can ensure the reproducibility of the research by professionalizing workflows and applying FAIR data principles. We elaborate on this dilemma with examples from an IT center’s perspective, and sketch a path towards unbiased research and the development of new IT services that are tailored for the scientific community.

How to cite: Frank, A. and Weismüller, J.: Scientific Partnership - a new level of collaboration between environmental scientists and IT specialists, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5005, https://doi.org/10.5194/egusphere-egu2020-5005, 2020

D802 |
Dirk Barbi, Nadine Wieters, Luisa Cristini, Paul Gierz, Sara Khosravi, Fatemeh Chegini, Joakim Kjellson, and Sebastian Wahl

Earth system and climate modelling involves the simulation of processes on a large range of scales, and within very different components of the earth system. In practice, component models from different institutes are mostly developed independently, and then combined using a dedicated coupling software.

This procedure not only leads to a wildly growing number of available versions of model components as well as coupled setups, but also to a specific way of obtaining and operating many of these. This can be a challenging problem (and potentially a huge waste of time) especially for unexperienced researchers, or scientists aiming to change to a different model system, e.g. for intercomparisons.

In order to define a standard way of downloading, configuring, compiling and running modular ESMs on a variety of HPC systems, AWI and partner institutions develop and maintain the OpenSource ESM-Tools software (https://www.esm-tools.net). Our aim is to provide standard solutions to typical problems occurring within the workflow of model simulations such as calendar operations, data postprocessing and monitoring, sanity checks, sorting and archiving of output, and script-based coupling (e.g. ice sheet models, isostatic adjustment models). The user only provides a short (30-40 lines) runscript of absolutely necessary experiment specific definitions, while the ESM-Tools execute the phases of a simulation in the correct order. A user-friendly API ensures that more experienced users have full control over each of these phases, and can easily add functionality. A GUI has been developed to provide a more intuitive approach to the modular system, and also to add a graphical overview over the available models and combinations.

Since revision 2 (released on March 19th 2019), the ESM-Tools were entirely re-written, separating the implementation of actions (written in Python 3) from any information that we have, either on models, coupled setups, software tools, HPC systems etc. into nicely structured yaml configuration files. This has been done to reduce maintenance problems, and also to ensure that also unexperienced scientist can easily edit configurations, or even add new models or software without any programming experience. Since revision 3 the ESM-Tools support four ocean models (FESOM1, FESOM2, NEMO, MPIOM), three atmosphere models (ECHAM6, OpenIFS, ICON), two BGC models (HAMOCC, REcoM), an ice sheet (PISM) and an isostatic adjustment model (VILMA) as well as standard settings for five HPC systems. For the future we plan to add interfaces to regional models and soil/hydrology models.

The Tools currently have more than 70 registered users from 5 institutions, and more than 40 authors of contributions to either model configurations or functionality.

How to cite: Barbi, D., Wieters, N., Cristini, L., Gierz, P., Khosravi, S., Chegini, F., Kjellson, J., and Wahl, S.: ESM-Tools: A common infrastructure for modular coupled earth system modelling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8868, https://doi.org/10.5194/egusphere-egu2020-8868, 2020

D803 |
Victor Bacu, Teodor Stefanut, and Dorian Gorgan

Agricultural management relies on good, comprehensive and reliable information on the environment and, in particular, the characteristics of the soil. The soil composition, humidity and temperature can fluctuate over time, leading to migration of plant crops, changes in the schedule of agricultural work, and the treatment of soil by chemicals. Various techniques are used to monitor soil conditions and agricultural activities but most of them are based on field measurements. Satellite data opens up a wide range of solutions based on higher resolution images (i.e. spatial, spectral and temporal resolution). Due to this high resolution, satellite data requires powerful computing resources and complex algorithms. The need for up-to-date and high-resolution soil maps and direct access to this information in a versatile and convenient manner is essential for pedology and agriculture experts, farmers and soil monitoring organizations.

Unfortunately, the satellite image processing and interpretation are very particular to each area, time and season, and must be calibrated by the real field measurements that are collected periodically. In order to obtain a fairly good accuracy of soil classification at a very high resolution, without using interpolation methods of an insufficient number of measurements, the prediction based on artificial intelligence techniques could be used. The use of machine learning techniques is still largely unexplored, and one of the major challenges is the scalability of the soil classification models toward three main directions: (a) adding new spatial features (i.e. satellite wavelength bands, geospatial parameters, spatial features); (b) scaling from local to global geographical areas; (c) temporal complementarity (i.e. build up the soil description by samples of satellite data acquired along the time, on spring, on summer, in another year, etc.).

The presentation analysis some experiments and highlights the main issues on developing a soil classification model based on Sentinel-2 satellite data, machine learning techniques and high-performance computing infrastructures. The experiments concern mainly on the features and temporal scalability of the soil classification models. The research is carried out using the HORUS platform [1] and the HorusApp application [2], [3], which allows experts to scale the computation over cloud infrastructure.



[1] Gorgan D., Rusu T., Bacu V., Stefanut T., Nandra N., “Soil Classification Techniques in Transylvania Area Based on Satellite Data”. World Soils 2019 Conference, 2 - 3 July 2019, ESA-ESRIN, Frascati, Italy (2019).

[2] Bacu V., Stefanut T., Gorgan D., “Building soil classification maps using HorusApp and Sentinel-2 Products”, Proceedings of the Intelligent Computer Communication and Processing Conference – ICCP, in IEEE press (2019).

[3] Bacu V., Stefanut T., Nandra N., Rusu T., Gorgan D., “Soil classification based on Sentinel-2 Products using HorusApp application”, Geophysical Research Abstracts, Vol. 21, EGU2019-15746, 2019, EGU General Assembly (2019).

How to cite: Bacu, V., Stefanut, T., and Gorgan, D.: Experiments on Machine Learning Techniques for Soil Classification Using Sentinel-2 Products , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11080, https://doi.org/10.5194/egusphere-egu2020-11080, 2020

D804 |
| Highlight
Maxim Sorokin, Mark Zheleznyak, Sergii Kivva, Pavlo Kolomiets, and Oleksandr Pylypenko

The shallow water flows in coastal areas of seas, rivers and reservoirs are simulated usually by 2-D depth averaged models. However, the needs for fine resolution of the computational grids and large scales of the modeling areas require in practical applications to use the algorithms and hardware of HPC. We present comparison of the computational efficiency of the developed parallel 2-D modeling system COASTOX on CPU based multi-processor systems and GPUs.

The hydrodynamic module of COASTOX is based on nonlinear shallow water equations (SWE), which describe currents and long waves, including tsunami, river flood waves and wake waves, generated by big vessels in shallow coastal areas. The special pressure term in momentum equations depending from the form of the draft of the vessel is used for wave generation by moving vessels. The currents in the marine nearshore areas generated by wind waves are described by the including into the SWE the wave-radiation stress terms. Sediment and pollutant transport are described by the 2-D advection-diffusion equations with the sink-source terms describing sedimentation-erosion and water-bottom contaminate exchange.

Model equations are solved by finite volume method on rectangular grids or unstructured grids with triangular cells. Solution scheme of SWE is Godunov-type, explicit, conservative, has TVD property. Second order in time and space is achieved by Runge-Kutta predictor-corrector method and using different methods for calculating fluxes at predictor and corrector steps. Transport equations schemes are simple upwind and have first order in time and space.

Model parallelized for computations on multi-core CPU systems based on domain decomposition approach with halo boundary structures and message-passing updating. To decompose an unstructured model grid, METIS graph partition library is used. For halo values updating the MPI technology is implemented with using of non-blocking send and receive functions.

For computations on GPU the model is parallelized using OpenACC directive-based programming interface. Numerical schemes of the model are implemented in the form of loops for cells, nodes, faces with independent iterations because of scheme explicitness and locality. So, OpenACC directives inserted in model code specify for compiler the loops that may be computed in parallel.

The efficiency of the developed parallel algorithms is demonstrated for CPU and GPU computing systems by such applications:

  1. Simulation of river flooding of July 2008 extreme flood on Prut river (Ukraine).
  2. Modeling of ship waves caused by tanker passage on the San Jacinto river near Barbours Cut Container Terminal (USA) and loads on moored container ship.
  3. Simulation of the consequences of the breaks of the dikes constructed on the heavy contaminated floodplain of the Pripyat River upstream Chernobyl Nuclear Power Plant.

For parallel performance testing we use Dell 7920 Workstation with 2 Intel Xeon Gold 6230 20 cores processors and NVIDIA Quadro RTX 5000 GPU. We obtain that multi-core computation up to 17.3 times faster than single core with parallel efficiency 43%. And for big computational grid (about or more than a million nodes) GPU faster than single core in 47.5-79.6 times and faster than workstation in 3-4.6 times.

How to cite: Sorokin, M., Zheleznyak, M., Kivva, S., Kolomiets, P., and Pylypenko, O.: High performance computing of waves, currents and contaminants in rivers and coastal areas of seas on multi-processors systems and GPUs, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11372, https://doi.org/10.5194/egusphere-egu2020-11372, 2020

D805 |
Clara Betancourt, Sabine Schröder, Björn Hagemeier, and Martin Schultz

The Tropospheric Ozone Assessment Report (TOAR) created one of the world’s largest databases for near-surface air quality measurements. More than 150 users from 35 countries have accessed TOAR data via a graphical web interface (https://join.fz-juelich.de) or a REST API (https://join.fz-juelich.de/services/rest/surfacedata/) and downloaded station information and aggregated statistics of ozone and associated variables. All statistics are calculated online from the hourly data that are stored in the database to allow for maximum user flexibility (it is possible, for example, to specify the minimum data capture criterion that shall be used in the aggregation). Thus, it is of paramount importance to measure and, if necessary, optimize the performance of the database and of the web services, which are connected to it. In this work, two aspects of the TOAR database service infrastructure are investigated: Performance enhancements by database tuning and the implementation of flux-based ozone metrics, which – unlike the already existing concentration based metrics – require meteorological data and embedded modeling.

The TOAR database is a PostgreSQL V10 relational database hosted on a virtual machine, connected to the JOIN web server. In the current set-up the web services trigger SQL queries and the resulting raw data are transferred on demand to the JOIN server and processed locally to derive the requested statistical quantities. We tested the following measures to increase the database performance: optimal definition of indices, server-side programming in PL/pgSQL and PL/Python, on-line aggregation to avoid transfer of large data, and query enhancement by the explain-analyze tool of PostgreSQL. Through a combination of the above mentioned techniques, the performance of JOIN can be improved in a range of 20 - 70 %.

Flux-based ozone metrics are necessary for an accurate quantification of ozone damage on vegetation. In contrast to the already available concentration based metrics of ozone, they require the input of meteorological and soil data, as well as a consistent parametrization of vegetation growing seasons and the inclusion of a stomatal flux model. Embedding this model with the TOAR database will make a global assessment of stomatal ozone fluxes possible for the first time ever. This requires new query patterns, which need to merge several variables onto a consistent time axis, as well as more elaborate calculations, which are presently coded in FORTRAN.

The presentation will present the results from the performance tuning and discuss the pros and cons of various ways how the ozone flux calculations can be implemented.

How to cite: Betancourt, C., Schröder, S., Hagemeier, B., and Schultz, M.: Performance analysis and optimization of a TByte-scale atmospheric observation database, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13637, https://doi.org/10.5194/egusphere-egu2020-13637, 2020

D806 |
Donatello Elia, Fabrizio Antonio, Cosimo Palazzo, Paola Nassisi, Sofiane Bendoukha, Regina Kwee-Hinzmann, Sandro Fiore, Tobias Weigel, Hannes Thiemann, and Giovanni Aloisio

Scientific data analysis experiments and applications require software capable of handling domain-specific and data-intensive workflows. The increasing volume of scientific data is further exacerbating these data management and analytics challenges, pushing the community towards the definition of novel programming environments for dealing efficiently with complex experiments, while abstracting from the underlying computing infrastructure. 

ECASLab provides a user-friendly data analytics environment to support scientists in their daily research activities, in particular in the climate change domain, by integrating analysis tools with scientific datasets (e.g., from the ESGF data archive) and computing resources (i.e., Cloud and HPC-based). It combines the features of the ENES Climate Analytics Service (ECAS) and the JupyterHub service, with a wide set of scientific libraries from the Python landscape for data manipulation, analysis and visualization. ECASLab is being set up in the frame of the European Open Science Cloud (EOSC) platform - in the EU H2020 EOSC-Hub project - by CMCC (https://ecaslab.cmcc.it/) and DKRZ (https://ecaslab.dkrz.de/), which host two major instances of the environment. 

ECAS, which lies at the heart of ECASLab, enables scientists to perform data analysis experiments on large volumes of multi-dimensional data by providing a workflow-oriented, PID-supported, server-side and distributed computing approach. ECAS consists of multiple components, centered around the Ophidia High Performance Data Analytics framework, which has been integrated with data access and sharing services (e.g., EUDAT B2DROP/B2SHARE, Onedata), along with the EGI federated cloud infrastructure. The integration with JupyterHub provides a convenient interface for scientists to access the ECAS features for the development and execution of experiments, as well as for sharing results (and the experiment/workflow definition itself). ECAS parallel data analytics capabilities can be easily exploited in Jupyter Notebooks (by means of PyOphidia, the Ophidia Python bindings) together with well-known Python modules for processing and for plotting the results on charts and maps (e.g., Dask, Xarray, NumPy, Matplotlib, etc.). ECAS is also one of the compute services made available to climate scientists by the EU H2020 IS-ENES3 project. 

Hence, this integrated environment represents a complete software stack for the design and run of interactive experiments as well as complex and data-intensive workflows. One class of such large-scale workflows, efficiently implemented through the environment resources, refers to multi-model data analysis in the context of both CMIP5 and CMIP6 (i.e., precipitation trend analysis orchestrated in parallel over multiple CMIP-based datasets).

How to cite: Elia, D., Antonio, F., Palazzo, C., Nassisi, P., Bendoukha, S., Kwee-Hinzmann, R., Fiore, S., Weigel, T., Thiemann, H., and Aloisio, G.: A Python-oriented environment for climate experiments at scale in the frame of the European Open Science Cloud, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17031, https://doi.org/10.5194/egusphere-egu2020-17031, 2020

D807 |
José Manuel Delgado Blasco, Antonio Romeo, David Heyns, Natassa Antoniou, and Rob Carrillo

The OCRE project, a H2020 funded by the European Commission, aims to increase the usage of Cloud and EO services by the European research community by putting available EC funds 9,5M euro, aiming to removing the barriers regarding the service discovery and providing services free-at-the-point-of-the-user.

The OCRE project, after one year running, has completed the requirements gathering by the European research community and during Q1 2020 has launched the tenders for the Cloud and EO services.

In the first part of 2020, these tenders will be closed and companies will be awarded to offer the services for which requirements had been collected by the project during 2019. The selection of such services will be based on the requirements gathered during the activities carried out by OCRE in 2019, with online surveys, face2face events, interviews among others. Additionally, OCRE team members had participated in workshops and conferences with the scope of project promotion and increase the awareness of the possibilities offered by OCRE for both research and service providers community.

In 2020, consumption of the services will start, and OCRE will distribute vouchers for individual researchers and institutions via known research organisations, which will evaluate the incoming request and distribute funds from the European Commission regularly.

This presentation will provide an overview or the possibilities offered by OCRE to researchers interested in boosting their activities using commercial cloud services.

How to cite: Delgado Blasco, J. M., Romeo, A., Heyns, D., Antoniou, N., and Carrillo, R.: OCRE: the game changer of Cloud and EO commercial services usage by the European research community, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17925, https://doi.org/10.5194/egusphere-egu2020-17925, 2020

D808 |
Diego A. Pérez Montes, Juan A. Añel, and Javier Rodeiro

CONDE (Climate simulation ON DEmand) is the final result of our work and research about climate and meteorological simulations over an HPC as a Service (HPCaaS) model. On our architecture we run very large climate ensemble simulations using a, adapted, WRF version that is executed on-demand and that can be deployed over different Cloud Computing environments (like Amazon Web Services, Microsoft Azure or Google Cloud) and that uses BOINC as middleware for the tasks execution and results gathering. Here, we also present as well some basic examples of applications and experiments to verify that the simulations ran in our system are correct and show valid results. 

How to cite: Pérez Montes, D. A., Añel, J. A., and Rodeiro, J.: CONDE: Climate simulation ON DEmand using HPCaaS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20342, https://doi.org/10.5194/egusphere-egu2020-20342, 2020