Earth science research has become increasingly collaborative. Researchers work together on data, software and algorithms to answer interesting research questions. Teams also share these data and software with other collaborators to refine and improve these products. As data volumes continue to grow, researchers will need new platforms to both enable analysis at scale and to support the sharing of data and software.

Software is critical to the success of science. Creating and using Free and Open Source Software (FOSS) fosters contributions from the scientific community, creates a peer-reviewed and consensus-oriented environment, and promotes the sustainability of science infrastructures.

This session will look at the role of Free and Open Source Software (FOSS), cloud-based architecture solutions, metadata and other user interfaces to support information sharing, scientific collaboration, scientific reproducibility and analytics at scale solutions.

Convener: Jens Klump | Co-conveners: Kaylin Bugbee, Peter Löwe, Rahul Ramachandran
| Attendance Mon, 04 May, 16:15–18:00 (CEST)

Files for download

Download all presentations (85MB)

Chat time: Monday, 4 May 2020, 16:15–18:00

Chairperson: Peter Löwe
D904 |
Thomas Huang

In recent years, NASA has invested significantly in developing an Analytics Center Framework (ACF) to encapsulate the scalable computational and data infrastructures and to harmonize data, tools and computation resources to enable scientific investigations. Since 2017, the Apache’s Science Data Analytics Platform (SDAP) (https://sdap.apache.org) has been adapted by various NASA-funded projects, including the NASA Sea Level Change Portal, GRACE and GRACE-FO missions, the CEOS Ocean Variables Enabling Research and Applications for GEO (COVERAGE) Initiative, etc. With much of existing approaches to Earth Science analysis are focusing on collocating all the relevant data under one system, running on the cloud, this open source platform empowers the global data centers to take on a federated analytics approach. With the growing community of SDAP centers, it is now possible for researcher to interactively analyze observational and model data hosted on different centers without having to collocate or download data to their own computing environment. This talk discusses the application of this professional open source big data analytics platform to establish a growing community of SDAP-based ACFs to enable distributed spatiotemporal analysis from any platform, using any programming languages.

How to cite: Huang, T.: Open Source Platform for Federated Spatiotemporal Analysis, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4203, https://doi.org/10.5194/egusphere-egu2020-4203, 2020.

D905 |
Alexander Herzig, Jan Zoerner, John Dymond, Hugh Smith, and Chris Phillips

An Interoperable Low-Code Modelling Framework for Integrated Spatial Modelling

Alexander Herzig, Jan Zoerner, John Dymond, Hugh Smith, Chris Phillips
Manaaki Whenua – Landcare Research New Zealand

Modelling complex environmental systems, such as earth surface processes, requires the representation and quantification of multiple individual but connected processes. In the Smarter Targeting Erosion Control (STEC) research programme, we are looking to improve understanding of where erosion occurs, how much and what type of sediment is produced and by which processes, how sediment moves through catchments, and how erosion and sediment transport can be targeted and mitigated cost-effectively. Different research groups involved in the programme will develop different model components representing different processes. To be able to assess the impact of sediment on water quality attributes in the river and for develop effective erosion control measures, the individual models need to be integrated to a composite model. 
In this paper we focus on the technical aspects and seamless integration of individual model components utilising the Basic Model Interface (BMI, Peckham et al. 2013) as interoperability standard and the extension of the LUMASS spatial modelling environment into a BMI-compliant model coupling framework. LUMASS provides a low-code visual development environment for building complex hierarchical system dynamics models that can be run in HPC environments and support sequential and parallel processing of large datasets. Each model developed in the framework can be exposed to other models and frameworks through the BMI-compliant LUMASS engine, without requiring any additional programming, thus greatly simplifying the development of interoperable model components. Here, we concentrate on the integration of BMI-compliant external model components and how they are coupled into the overall model structure. 
In the STEC programme, we use LUMASS for both the implementation of model components representing individual soil erosion processes, such as landslides, earthflows, and surficial erosion and for the integration (i.e. coupling) of other (external) BMI-compliant model components into a composite model. Using available (prototype) models we will demonstrate how LUMASS’ visual development environment can be used to build interoperable integrated component models with very little coding requirements. 

Peckham SD, Hutton EWH, Boyana N 2013. A Component-based approach to integrated modelling in the geosciences: The design of CSDMS. Computers & Geosciences 53: 3—12. http://dx.doi.org/10.1016/j.cageo.2012.04.002 

LUMASS: https://bitbucket.org/landcareresearch/lumass 

How to cite: Herzig, A., Zoerner, J., Dymond, J., Smith, H., and Phillips, C.: An Interoperable Low-Code Modelling Framework for Integrated Spatial Modelling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20868, https://doi.org/10.5194/egusphere-egu2020-20868, 2020.

D906 |
Meiert W. Grootes, Christiaan Meijer, Zsofia Koma, Bouwe Andela, Elena Ranguelova, and W. Daniel Kissling

LiDAR as a remote sensing technology, enabling the rapid 3D characterization of an area from an air- or spaceborne platform, has become a mainstream tool in the (bio)geosciences and related disciplines. For instance, LiDAR-derived metrics are used for characterizing vegetation type, structure, and prevalence and are widely employed across ecosystem research, forestry, and ecology/biology. Furthermore, these types of metrics are key candidates in the quest for Essential Biodiversity Variables (EBVs) suited to quantifying habitat structure, reflecting the importance of this property in assessing and monitoring the biodiversity of flora and fauna, and consequently in informing policy to safeguard it in the light of climate change an human impact.

In all these use cases, the power of LiDAR point cloud datasets resides in the information encoded within the spatial distribution of LiDAR returns, which can be extracted by calculating domain-specific statistical/ensemble properties of well-defined subsets of points.  

Facilitated by technological advances, the volume of point cloud data sets provided by LiDAR has steadily increased, with modern airborne laser scanning surveys now providing high-resolution, (super-)national scale datasets, tens to hundreds of terabytes in size and encompassing hundreds of billions of individual points, many of which are available as open data.

Representing a trove of data and, for the first time, enabling the study of ecosystem structure at meter resolution over the extent of tens to hundreds of kilometers, these datasets represent highly valuable new resources. However, their scientific exploitation is hindered by the scarcity of Free Open Source Software (FOSS) tools capable of handling the challenges of accessing, processing, and extracting meaningful information from massive multi-terabyte datasets, as well as by the domain-specificity of any existing tools.

Here we present Laserchicken a FOSS, user-extendable, cross-platform Python tool for extracting user-defined statistical properties of flexibly defined subsets of point cloud data, aimed at enabling efficient, scalable, and distributed processing of multi-terabyte datasets. Laserchicken can be seamlessly employed on computing architectures ranging from desktop systems to distributed clusters, and supports standard point cloud and geo-data formats (LAS/LAZ, PLY, GeoTIFF, etc.) making it compatible with a wide range of (FOSS) tools for geoscience.

The Laserchicken feature extraction tool is complemented by a FOSS Python processing pipeline tailored to the scientific exploitation of massive nation-scale point cloud datasets, together forming the Laserchicken framework.

The ability of the Laserchicken framework to unlock nation-scale LiDAR point cloud datasets is demonstrated on the basis of its use in the eEcoLiDAR project, a collaborative project between the University of Amsterdam and the Netherlands eScience Center. Within the eEcoLiDAR project, Laserchicken has been instrumental in defining classification methods for wetland habitats, as well as in facilitating the use of high-resolution vegetation structure metrics in modelling species distributions at national scales, with preliminary results highlighting the importance of including this information.

The Laserchicken Framework rests on FOSS, including the GDAL and PDAL libraries as well as numerous packages hosted on the open source Python Package Index (PyPI), and is itself also available as FOSS (https://pypi.org/project/laserchicken/ and https://github.com/eEcoLiDAR/ ).

How to cite: Grootes, M. W., Meijer, C., Koma, Z., Andela, B., Ranguelova, E., and Kissling, W. D.: Unlocking modern nation-scale LiDAR datasets with FOSS – the Laserchicken framework, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11674, https://doi.org/10.5194/egusphere-egu2020-11674, 2020.

D907 |
Ward Fisher and Dennis Heimbigner

NetCDF has historically offered two different storage formats for the netCDF data model: files based on the original netCDF binary format, and files based on the HDF5 format. While this has proven effective in the past for traditional disk storage, it is less efficient for modern cloud-focused technologies such as those provided by Amazon S3, Microsoft Azure, IBM Cloud Object Storage, and other cloud service providers. As with the decision to base the netCDF Extended Data Model and File Format on the HDF5 technology, we do not want to reinvent the wheel when it comes to cloud storage. There are a number of existing technologies that the netCDF team can use to implement native object storage capabilities. Zarr enjoys broad popularity within the Unidata community, particularly among our Python users. By integrating support for the latest Zarr specification (while not locking ourselves in to a specific version), we will be able to provide the broadest support for data written by other software packages which use the latest Zarr specification.

How to cite: Fisher, W. and Heimbigner, D.: NetCDF in the Cloud: modernizing storage options for the netCDF Data Model with Zarr, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10341, https://doi.org/10.5194/egusphere-egu2020-10341, 2020.

D908 |
Willi Rath, Carsten Schirnick, and Claas Faber

This presentation will detail the design, implementation, and operation of ERDA, which is a collection of external version-controlled research datasets, of multiple synchronized deployments of the data, of a growing set of minimal examples using the datasets from various deployments, of stand-alone tools to create, maintain, and deploy new datasets, and of documentation targeting different audiences (users, maintainers, developers).

ERDA was designed with the following principles in mind: Provide clear data provenance and ensure long-term availability, minimize effort for adding data and make all contents available to all users immediately, ensure unambiguous referencing and develop transparent versioning conventions, embrace mobility of scientists and target independence from the infrastructure of specific institutions.

The talk will show how the data management is done with Git-LFS, demonstrate how data repositories are rendered from human-readable data, and give an overview of the versioning scheme that is applied.

How to cite: Rath, W., Schirnick, C., and Faber, C.: ERDA: External Version-Controlled Research Data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10939, https://doi.org/10.5194/egusphere-egu2020-10939, 2020.

D909 |
Matthes Rieke, Sebastian Drost, Simon Jirka, and Arne Vogt

Earth Observation data has become available and obtainable in continuously increasing quality as well as spatial and temporal coverage. To deal with the massive amounts of data, the WaCoDiS project aims at developing an architecture that allows its automated processing. The project focuses on the development of innovative water management analytics services based on Earth Observation data such as provided by the Copernicus Sentinel missions. The goal is to improve hydrological models including but not limited to: a) identification of the catchment areas responsible for pollutant and sediment inputs; b) detection of turbidity sources in water bodies and rivers. The central contribution is a system architecture design following the Microservice architecture pattern: small components fulfil different tasks and responsibilities (e.g. managing processing jobs, data discovery, process scheduling and execution).  In addition, processing algorithms, that are encapsulated by Docker containers, can be easily integrated using the OGC Web Processing Service Interface. The orchestration of the different components builds a fully functional ecosystem that is ready for deployment on single machines as well as cloud infrastructures such as a Copernicus DIAS node or commercial cloud environments (e.g. Google Cloud Platform, Amazon Web Services). All components are encapsulated within Docker containers.

The different components are loosely coupled and react to messages and events which are published on a central message broker component. This allows the flexible scaling and deployment of the system. For example, the management components can run on physical different locations than the processing algorithms. Thus, the system supports the reduction of manual work (e.g. identification of relevant input data, execution of algorithms) and minimizes the required interaction of domain users. Once a Processing Job is registered within the system, the user can track the status of it (e.g. when it was last executed, if an error occurred) and will eventually be informed when new processing results are available.

In summary, this work targets to develop a system that allows the automated and event-driven creation of Earth Observation products. It is suitable to run on Copernicus DIAS nodes or on dedicated environments such as a Kubernetes Cluster.

In our contribution, we will present the event-driven processing workflows within the WaCoDiS system that enables the automation of water management related analytics services. In addition, we will focus on architectural details of the Microservice oriented system design and discuss different deployment options.

How to cite: Rieke, M., Drost, S., Jirka, S., and Vogt, A.: Event-driven Processing of Earth Observation Data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16661, https://doi.org/10.5194/egusphere-egu2020-16661, 2020.

D910 |
Clement Albinet, Sebastien Nouvellon, Björn Frommknecht, Roger Rutakaza, Sandrine Daniel, and Carine Saüt

The ESA-NASA multi-Mission Algorithm and Analysis Platform (MAAP) is dedicated to the BIOMASS [1], NISAR [2] and GEDI [3] missions. This analysis platform will be a virtual open and collaborative environment. The main goal is to bring together data centres (Earth Observation and non-Earth Observation data), computing resources and hosted processing in order to better address the needs of scientists and federate the scientific community.

The MAAP will provide functions to access data and metadata from different sources such as Earth observation satellites data from science missions; visualisation functions to display the results of the system processing (trends, graphs, maps ...) and results of statistic and analysis tools; collaborative functions to share data, algorithms, ideas between the MAAP users; processing functions including development environments and an orchestration system allowing to create and run processing chains from official algorithms.

Currently, the MAAP is in its pilot phase. The architecture for the MAAP pilot foresees two independent elements, one developed by ESA, one developed by NASA, unified by a common user entry point. Both elements will be deployed on Cloud infrastructures. Interoperability between the elements is envisaged for data discovery, data access and identity and access management.

The ESA element architecture is based on technical solutions including: Microservices, Docker images, Kubernetes; Cloud-based virtual development environments (such as Jupyter or Eclipse CHE) for the MAAP algorithm developers; a framework to create, run and monitor chains of algorithms containerised as docker images. Interoperability between both ESA and NASA elements will be based on CMR (NASA Common Metadata Repository), services bases on OGC standards (such as WMS/WMTS, WCS and WPS) and secured with the OAUTH2 protocol.

This presentation focuses on the pilot platform and how interoperability between the NASA and ESA elements will be achieved. It also gives insight into the architecture of the ESA element and the technical implementation of this virtual environment. Finally, it will present the very first achievements and return of experience of the pilot platform.



[1] T. Le Toan, S. Quegan, M. Davidson, H. Balzter, P. Paillou, K. Papathanassiou, S. Plummer, F. Rocca, S. Saatchi, H. Shugart and L. Ulander, “The BIOMASS Mission: Mapping global forest biomass to better understand the terrestrial carbon cycle”, Remote Sensing of Environment, Vol. 115, No. 11, pp. 2850-2860, June 2011.

[2] P.A. Rosen, S. Hensley, S. Shaffer, L. Veilleux, M. Chakraborty, T. Misra, R. Bhan, V. Raju Sagi and R. Satish, "The NASA-ISRO SAR mission - An international space partnership for science and societal benefit", IEEE Radar Conference (RadarCon), pp. 1610-1613, 10-15 May 2015.

[3] https://science.nasa.gov/missions/gedi

How to cite: Albinet, C., Nouvellon, S., Frommknecht, B., Rutakaza, R., Daniel, S., and Saüt, C.: MAAP: The Mission Algorithm and Analysis Platform: A New Virtual and Collaborative Environment for the Scientific Community, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19989, https://doi.org/10.5194/egusphere-egu2020-19989, 2020.

D911 |
Milto Miltiadou, Maria Prodromou, Athos Agapiou, and Diofantos G. Hadjimitsis

DASOS is an open source software developed by the authors of this abstract to support the usage of full-waveform (FW) LiDAR data. Traditionally LiDAR record only a few peak point returns, while FW LiDAR systems digitizes the entire backscattered signal returned to the instrument into discrete waveforms. Each waveform consists of a set of waveform samples equally spaced. Extraction of peak points from waveforms reduces data and they can be embedded into existing workflows. Nevertheless, this approach discretizes the data. In recent studies, voxelization of FW LiDAR data has been increased. The open source software DASOS uses voxelization for the interpretation the FW LiDAR data and has four main functionalities: (1) extraction of 2D metrics, e.g. height, density, (2) reconstruction of 3D polygonal meshes from the data (3) alignment with hyperspectral imagery for generating aligned metrics with the FW LiDAR data and colored polygonal meshes, (4) extraction of local features using 3D windows, e.g. standard deviation of heights within the 3D window.

Here, we do not only present the functionalities of DASOS but also how the extraction of complex structural features from local areas, 3D windows, could be used for improving forest inventories. In Southern Australia, dead trees plays a substantial role in managing biodiversity since they are more likely to contain hollows and consequently shelter native, protected species. The study area is a native River Red Gum (Eucalyptus camaldulensis) forest. Eucalypt trees are difficult to delineate due to their irregular shapes and multiple trunk split. Using field data, positive (dead standing trees) and negative (live trees) samples were defined and for each sample multiple features were extracted using 3D windows from DASOS. With 3D object detection, it was shown that it is possible to detect them without tree delineation. The studies was further improved with the introduction of multi-scale 3D windows for categorizing trees according to their height and doing a three pass detection, one for each size category. By cross validating the results, it was shown that the multi-scale 3D-window approach further improved detection of dead standing Eucalypt trees. The extraction of structural features using DASOS and the methodology implemented could be applied to further forest related applications.

The project ‘FOREST’ (OPPORTUNITY/0916/0005) is co-financed by the European Regional Development Fund and the Republic of Cyprus through the Research Innovation Foundation.

How to cite: Miltiadou, M., Prodromou, M., Agapiou, A., and Hadjimitsis, D. G.: Structural features extracted from voxelised full-waveform LiDAR using the open source software DASOS for detecting dead standing trees , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10915, https://doi.org/10.5194/egusphere-egu2020-10915, 2020.

D912 |
Zhaokun Zhai, Jianjun Liu, and Yin Gao

The geographic conditions monitoring is an important mission in geosciences. Its aim is to study, analyze and describe national conditions in the view of geography. The National Geographic Conditions Monitoring Project of China, based on remote sensing and geospatial information technology, has acquired large-scale and various kinds of geographic data in China, such as remote sensing images, land cover information and geographic conditional elements. The goal of this project is to build National Geographic Conditions Monitoring Database, which is aimed to offer reliable fundamental geoinformation for government decision-making. It plays an important role in natural resources supervision, environmental protection and emergency management. Moreover, it also contributes to the development of geosciences. However, as China is such a huge country, large quantity of data is produced by many institutions and companies. It makes it difficult to finish data quality check manually before importing data into oracle spatial database. Besides, there are many data applications from lots of institutions every year, which also spends plenty of time.

Python is an open source computer programming language. It has the characteristics of friendly, clear syntax and easy to learn. There are large numbers of standard libraries and third-party libraries. Based on python, we developed lots of python scripts for this project. From the viewpoint of geodatabase construction, we developed scripts to check collected data, mainly include directory check, structure check, attribute check and topology check to ensure data is standardized and correct. Spatial analysis and statistical calculation can also be finished rapidly and accurately using python script. For production supply, we also developed scripts which can distribute data from database automatically according to any region.

Tools are critical to the progress and development of science. The application of python scripts improves the efficiency of our work to some extent, which can make sure the project is successfully completed on time every year. Geographic data is obtained that covered all over the country, which contributes to the economic and social development, national strategic decision and planning. The source code of these scripts is public. It also helps to optimize and improve these scripts. I believe open source software will play a greater role in the future. Geoscience will get better and better when geographic data is processed and analyzed using open source software.

How to cite: Zhai, Z., Liu, J., and Gao, Y.: Application of Python Script in National Geographic Conditions Monitoring Project of China, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6897, https://doi.org/10.5194/egusphere-egu2020-6897, 2020.

D913 |
Jaro Camphuijsen, Ronald van Haren, Yifat Dzigan, Niels Drost, Fakhareh Alidoost, Bouwe Andela, Jerom Aerts, Berend van Weel, Rolf Hut, and Peter Kalverla

With the release of the ERA5 dataset, worldwide high resolution reanalysis data became available with open access for public use. The Copernicus CDS (Climate Data Store) offers two options for accessing the data: a web interface and a Python API. Consequently, automated downloading of the data requires advanced knowledge of Python and a lot of work. To make this process easier, we developed era5cli. 

The command line interface tool era5cli enables automated downloading of ERA5 using a single command. All variables and options available in the CDS web form are now available for download in an efficient way. Both the monthly and hourly dataset are supported. Besides automation, era5cli adds several useful functionalities to the download pipeline.

One of the key options in era5cli is to spread one download command over multiple CDS requests, resulting in higher download speeds. Files can be saved in both GRIB and NETCDF format with automatic, yet customizable file names. The `info` command lists correct names of the available variables and pressure levels for 3D variables. For debugging purposes and testing the `--dryrun` option can be selected to return only the CDS request. An overview of all available options, including instructions on how to configure your CDS account, is available in our documentation. Source code is available on https://github.com/eWaterCycle/era5cli.

In this PICO presentation we will provide an overview of era5cli, as well as a short introduction on how to use era5cli.

How to cite: Camphuijsen, J., van Haren, R., Dzigan, Y., Drost, N., Alidoost, F., Andela, B., Aerts, J., van Weel, B., Hut, R., and Kalverla, P.: era5cli: The command line tool to download ERA5 data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21619, https://doi.org/10.5194/egusphere-egu2020-21619, 2020.

D914 |
Oliver Schmitz, Kor de Jong, and Derek Karssenberg

The heterogeneous nature of environmental systems poses a challenge to researchers constructing environmental models. Many simulation models of integrated systems need to incorporate phenomena that are represented as spatially and temporally continuous fields as well as phenomena that are modelled as spatially and temporally bounded agents. Examples include moving animals (agents) interacting with vegetation (fields) or static water reservoirs (agents) as components of hydrological catchments (fields). However, phenomena bounded in space and time have particular properties mainly because they require representation of multiple (sometimes mobile) objects that each exist in a small subdomain of the space-time domain of interest. Moreover, these subdomains of objects may overlap in space and time such as interleaving branches due to tree crown growth. Efficient storage and access of different types of phenomena requires an approach that integrates representation of fields and objects in a single data model.

We develop the open-source LUE data model that explicitly stores and separates domain information, i.e. where phenomena exist in the space-time domain, and property information, i.e. what attribute value the phenomenon has at a particular space-time location, for a particular object. Notable functionalities are support for multiple spatio-temporal objects, time domains, objects linked to multiple space and time domains, and relations between objects. The design of LUE is based on the conceptual data model of de Bakker (2017) and implemented as a physical data model using HDF5 and C++ (de Jong, 2019). Our LUE data model is part of a new modelling language implemented in Python, allowing for operations accepting both fields and agents as arguments, and therefore resembling and extending the map algebra approach to field-agent modelling.

We present the conceptual and physical data models and illustrate the usage by implementing a spatial agent-based model simulating changes in human nutrition. We thereby consider the interaction between personal demand and supply of healthy food of nearby stores as well as the influence of agent's social network.


de Bakker, M. P., de Jong, K., Schmitz, O., & Karssenberg, D. (2017). Design and demonstration of a data model to integrate agent-based and field-based modelling. Environmental Modelling & Software, 89, 172–189. https://doi.org/10.1016/j.envsoft.2016.11.016

de Jong, K., & Karssenberg, D. (2019). A physical data model for spatio-temporal objects. Environmental Modelling & Software. https://doi.org/10.1016/j.envsoft.2019.104553

LUE source code repository: https://github.com/pcraster/lue/

How to cite: Schmitz, O., de Jong, K., and Karssenberg, D.: Integrated field-agent based modelling using the LUE scientific data base, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8583, https://doi.org/10.5194/egusphere-egu2020-8583, 2020.

D915 |
Massimiliano Cannata, Milan Antonovic, Nils Oesterling, and Sabine Brodhag

The shallow underground is of primary importance in governing and planning the territories where we live. In fact, the uppermost 500 meters below the ground surface are interested by a growing number of anthropic activities like constructions, extraction of drinking water, mineral resources, installation of geothermal probes, etc. Borehole data are therefore essential as they reveal at specific location the vertical sequence of geological layers which in turns can provide an understanding of the geological conditions we can expect in the shallow underground. Unfortunately, data are rarely available in a FAIR way that as the acronym specify are Findable, Accessible, Interoperable and Reusable.

Most of the time data, particularly those collected in the past, are in the form of static data reports that describe the stratigraphy and the related characteristics; these data types are generally available as paper documents, or static files like .pdf of images (.ai). While very informative, these documents are not searchable, not interoperable nor easily reusable, since they require a non negligible time for data integration. Sometime, data are archived into database. This certainly improve the find-ability of the data and its accessibility but still do not address the interoperability requirement and therefore, combining data from different sources remain a problematic task. To enable FAIR borehole data and facilitate the different entities (public or private) management swisstopo (www.swisstopo.ch) has funded the development of a Web application named Borehole Data Management System (BDMS) [1] that adopt the borehole data model () [2] implemented by the Swiss Geological Survey.

Among the benefits of adopting a standard model we can identify:

  • Enhance the exchange, the usage and quality of the data
  • Reach data harmonization (level of detail, precise definitions, relationships and dependencies among the data),
  • Establish a common language between stakeholders

The Borehole Data Management System (BDMS)  was developed using the latest Free and Open Source Technologies. The new application integrates some of the today’s best OSGeo projects and is available as a modular open source solution on GitHub and ready to use in a docker container available on Docker Hub. Through two types of authorization, Explorer users are able to search the BDMS for specific boreholes, navigate a configurable user friendly map, apply filters, explore the stratigraphy layers of each borehole and export all the data in Shapefiles, CSV or PDF. Editors are able to manage in details the informations and publish the results after passing a validation process.



[1] http://geoservice.ist.supsi.ch/docs/bdms/index.html

[2] https://www.geologieportal.ch/en/knowledge/lookup/data-models/borehole-data-model.html 

How to cite: Cannata, M., Antonovic, M., Oesterling, N., and Brodhag, S.: Borehole Data Management System: a web interface for borehole data acquisition, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19423, https://doi.org/10.5194/egusphere-egu2020-19423, 2020.

D916 |
Felix Bachofer, Thomas Esch, Jakub Balhar, Martin Boettcher, Enguerran Boissier, Mattia Marconcini, Annekatrin Metz-Marconcini, Michal Opletal, Fabrizio Pacini, Tomas Soukup, Vaclav Svaton, and Julian Zeidler

Urbanization is among the most relevant global trends that affects climate, environment, as well as health and socio-economic development of a majority of the global population. As such, it poses a major challenge for the current urban population and the well-being of the next generation. To understand how to take advantage of opportunities and properly mitigate to the negative impacts of this change, we need precise and up-to-date information of the urban areas. The Urban Thematic Exploitation Platform (UrbanTEP) is a collaborative system, which focuses on the processing of earth observation (EO) data and delivering multi-source information on trans-sectoral urban challenges.

The U-TEP is developed to provide end-to-end and ready-to-use solutions for a broad spectrum of users (service providers, experts and non-experts) to extract unique information/ indicators required for urban management and sustainability. Key components of the system are an open, web-based portal connected to distributed high-level computing infrastructures and providing key functionalities for

i) high-performance data access and processing,

ii) modular and generic state-of-the art pre-processing, analysis, and visualization,

iii) customized development and sharing of algorithms, products and services, and

iv) networking and communication.

The service and product portfolio provides access to the archives of Copernicus and Landsat missions, Datacube technology, DIAS processing environments, as well as premium products like the World Settlement Footprint (WSF). External service providers, as well as researchers can make use of on-demand processing of new data products and the possibility of developing and deploying new processors. The onboarding of service providers, developers and researchers is supported by the Network of Resources program of the European Space Agency (ESA) and the OCRE initiative of the European Commission.

In order to provide end-to-end solutions, the VISAT tool on UrbanTEP allows analyzing and visualizing project-related geospatial content and to develop storylines to enhance the transport of research output to customers and stakeholders effectively. Multiple visualizations (scopes) are already predefined. One available scope exemplary illustrates the exploitation of the WSF-Evolution dataset by analyzing the settlement and population development for South-East Asian countries from 1985 to 2015 in the context of the Sustainable Development Goal (SDG) 11.3.1 indicator. Other open scopes focus on urban green, functional urban areas, land-use and urban heat island modelling (e.g.).

How to cite: Bachofer, F., Esch, T., Balhar, J., Boettcher, M., Boissier, E., Marconcini, M., Metz-Marconcini, A., Opletal, M., Pacini, F., Soukup, T., Svaton, V., and Zeidler, J.: Urban Thematic Exploitation Platform - supporting urban research with EO data processing, integrative data analysis and reporting , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1325, https://doi.org/10.5194/egusphere-egu2020-1325, 2020.

D917 |
Chandra Taposeea-Fisher, Andrew Groom, Jon Earl, and Peter Van Zetten

Our ability to observe the Earth is transforming, with substantially more satellite imagery and geospatial data fuelling big data-driven opportunities to better monitor and manage the Earth and its systems. CGI’s GeoData360 solves common technical challenges for those aiming to exploit these new opportunities.

Reliable monitoring solutions that run efficiently at scale require substantial ICT resources and more sophisticated data processing capabilities that can be complex and costly. Cloud-based resources enable new approaches using large, multi-tenant infrastructures, enabling solutions to benefit from massive infrastructural resources, otherwise unattainable for the individual user. GeoData360 makes these opportunities accessible to a wide user base.

GeoData360 is CGI’s cloud-hosted production platform for Earth Observation (EO) and Geospatial services. GeoData360 is designed for long running, large scale production pipelines as a Platform-as-a-Service. It supports deep customisation and extension, enabling production workflows that consume large volumes of EO and Geospatial data to run cost efficiently at scale.

GeoData360 is fully scalable, works dynamically and optimises the use of infrastructure resources available from commercial cloud providers, whilst also reducing elapsed processing times. It has the advantage of being portable and securely deployable within public or private cloud environments. Its operational design provides the reliable, consistent performance needed for commercially viable services. The platform is aimed at big data, with production capabilities applicable to services based on EO imagery and other Geospatial data (climate data, meteorological data, points, lines, polygons etc.). GeoData360 has been designed to support cost effective production, with applications using only the resources that are required.

CGI has already used GeoData360 as enabling technology on EO and non-EO initiatives, benefitting from: (1) granularity, with containerisation at the level of the individual processing step, allowing increased flexibility, efficient testing and implementation, and improved optimisation potential for dynamic scaling; (2) standardisation, with a centralised repository of standardised processing steps enabling efficient re-use for rapid prototyping; (3) orchestration and automation, by linking process steps into complete processing workflows, enabling the granular approach and reducing operational costs; (4) dynamic scaling, for processing resources and for storage; (5) inbuilt monitoring with graphical feedback providing transparency on system performance, allowing to maintain system control for highly automated workflows; (6) data access, with efficient access to online archives; (7) security, with access control and protection for third Party Intellectual Property. Example initiatives that benefit from GeoData360 include PASSES (Peatland Assessment in SE Asia via Satellite) and HiVaCroM (High Value Crop Monitoring). Both initiatives have used GeoData360 to enable data intensive production workflows to be deployed and run at national to regional scales.

GeoData360 solves the challenges of providing production-ready offerings: reliability, repeatability, traceability and monitoring. Our solution solves the scaling issues inherent in batch processing large volumes of bulky data and decoupling the algorithms from the underlying infrastructure. GeoData360 provides a trusted component in the development, deployment and successful commercialisation of big data-driven solutions.

How to cite: Taposeea-Fisher, C., Groom, A., Earl, J., and Van Zetten, P.: CGI GeoData360: a cloud-based scalable production platform for big data-driven solutions of Earth Observation and Geospatial services., EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10967, https://doi.org/10.5194/egusphere-egu2020-10967, 2020.

D918 |
Ulrich Leopold, Benedikt Gräler, Henning Bredel, J. Arturo Torres-Matallana, Philippe Pinheiro, Mickaël Stefas, Thomas Udelhoven, Jeroen Dries, Bernard Valentin, Leslie Gale, Philippe Mougnaud, and Martin Schlerf

We present an implementation of a time series analysis toolbox for remote sensing imagery in R which has been largely funded by the European Space Agency within the PROBA-V MEP Third Party Services project. The toolbox is developed according to the needs of the time series analysis community. The data is provided by the PROBA-V mission exploitation platform (MEP) at VITO. The toolbox largely builds on existing specialized R packages and functions for raster and time series analysis combining these in a common framework.

In order to ease access and usage of the toolbox, it has been deployed in the MEP Spark Cluster to bring the algorithm to the data. All functions are also wrapped in a Web Processing Service (WPS) using 52°North’s WPS4R extension for interoperability across web platforms. The WPS can be orchestrated in the Automatic Service Builder (ASB) developed by Space Applications. Hence, the space-time analytics developed in R can be integrated into a larger workflow potentially integrating external data and services. The WPS provides a Webclient including a preview of the results in a map window for usage within the MEP. Results are offered for download or through Web Mapping and Web Coverage Services (WMS, WCS) provided through a Geoserver instance.

Through its interoperability features the EOTSA toolbox provides a contribution towards collaborative science.

How to cite: Leopold, U., Gräler, B., Bredel, H., Torres-Matallana, J. A., Pinheiro, P., Stefas, M., Udelhoven, T., Dries, J., Valentin, B., Gale, L., Mougnaud, P., and Schlerf, M.: The Earth Observation Time Series Analysis Toolbox (EOTSA) - An R package with WPS, Web-Client and Spark integration, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21974, https://doi.org/10.5194/egusphere-egu2020-21974, 2020.