Earth systems science is fundamentally cross-disciplinary, and increasingly this requires sharing and exchange of geoscientific information across discipline boundaries. This information can be both rich and complex, and content is not always readily interpretable by either humans or machines. Difficulties arise through differing exchange formats, lack of common semantics, divergent access mechanisms, etc.

Recent developments in distributed, service-oriented, information systems using web-based (W3C, ISO, OGC) standards are leading to advances in data interoperability. At the same time, work is underway to understand how meaning may be represented using ontologies and other semantic mechanisms, and how this can be shared with other scientists.

This session aims to explore developments in interoperable data sharing, and the representation of semantic meaning to enable interpretation of geoscientific information. Topics may include, but are not limited to:
- standards-based information modelling
- interoperable data sharing
- use of metadata
- knowledge representation
- use of semantics in an interoperability context
- application of semantics to discovery and analysis
- metadata and collaboration

Please Note: abstracts chosen for presentation during the ESSI 2.1 session will be considered for publication in a Special Issue of (IJGI) International Journal of Geo-Information: https://www.mdpi.com/journal/ijgi, titled "On Denotation and Connotation in Web Semantics, Collaboration and Metadata” More in formation at this link: https://www.mdpi.com/journal/ijgi/special_issues/denotation_connotation

Convener: Paolo Diviacco | Co-conveners: Kristine Asch, Paolo Mazzetti
| Attendance Mon, 04 May, 08:30–10:15 (CEST)

Files for download

Download all presentations (52MB)

Chat time: Monday, 4 May 2020, 08:30–10:15

D879 |
Aaron Kaulfus, Kaylin Bugbee, Alyssa Harris, Rahul Ramachandran, Sean Harkins, Aimee Barciauskas, and Deborah Smith

Algorithm Theoretical Basis Documents (ATBDs) accompany Earth observation data generated from algorithms. ATBDs describe the physical theory, mathematical procedures and assumptions made for the algorithms that convert radiances received by remote sensing instruments into geophysical quantities. While ATBDs are critical to scientific reproducibility and data reuse, there have been technical, social and informational issues surrounding the creation and maintenance of these key documents. A standard ATBD structure has been lacking, resulting in inconsistent documents of varying levels of detail. Due to the lack of a minimum set of requirements, there has been very little formal guidance on the ATBD publication process.  Additionally, ATBDs have typically been provided as static documents that are not machine readable, making search and discovery of the documents and the content within the documents difficult for users. To address the challenges surrounding ATBDs, NASA has prototyped the Algorithm Publication Tool (APT), a centralized cloud-based publication tool that standardizes the ATBD content model and streamlines the ATBD authoring process. This presentation will describe our approach in developing a common information model for ATBDs and our efforts to provide ATBDs as dynamic documents that are available for both human and machine utilization. We will also include our vision for APT within the broader NASA Earth science data system and how this tool may assist in standardizes and easing the ATBD creation and maintenance process.

How to cite: Kaulfus, A., Bugbee, K., Harris, A., Ramachandran, R., Harkins, S., Barciauskas, A., and Smith, D.: Ensuring Scientific Reproducibility within the Earth Observation Community: Standardized Algorithm Documentation for Improved Scientific Data Understanding, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3663, https://doi.org/10.5194/egusphere-egu2020-3663, 2020.

D880 |
Alaitz Zabala Torres, Joan Masó Pau, and Xavier Pons

First approach to metadata was based on producer's point of view, since producers were responsible for documenting and sharing metadata about their products. Since 2012 (started in EU FP7 GeoViQua project), the Geospatial User Feedback approach described the user perspective on datasets/services (GUF, OGC standard in 2016). In the past users of the data gained knowledge about and with the data, but they lacked the means to easily and automatically share this knowledge in a formal way.

In the EU H2020 NextGEOSS project, the NiMMbus system has been matured as an interoperable solution to manage and store feedback items following the OGC GUF standard. NiMMbus can be used as a component for any geospatial portal, and, so far, has been integrated in several H2020 project catalogues or portals (NextGEOSS, ECOPotential, GeoEssential and GroundTruth2.0).

User feedback metadata complements producer's metadata and adds value to the resource description in a geospatial portal by collecting the knowledge gained by the user while using the data for the purpose originally foreseen by the producer or an innovative one.

The current GEOSS platform provide access to endless data resources. But to truly assist decision making, GEOSS wants to add a knowledge base. We believe that the NiMMbus system is a significant NextGEOSS contribution is this direction.

This communication describes how to extend the GUF to provide a set of knowledge elements and connect them to the original data creating a network of knowledge. They can be citations (publications and policy briefs), quality indications (QualityML vocabulary and ISO 19157), usage reports (code and analytical processes), etc. The NiMMbus offers tools to create different levels of feedback starting with comments, providing citations or extract quality indicators for the different quality classes (positional, temporal and attribute accuracy, completeness, consistency) and share them to other users as part of the user feedback and usage report. Usage reports in GUF standards can be extended to include code fragments that other users can apply to reproduce a previous usage. For example, in ECOPotential Protected Areas from Space map browser (continues on H2020 e-Shape project) a vegetation index optimum to observe phenological blooms can be encoded by a user in the layer calculation using a combination of original Sentinel-2 bands. The portal stores that in a JavaScript code (serialized as JSON) that describes which layers and formula were used. Once a user validated the new layer, can decide to make it available to everyone by publishing it as an open source JavaScript code in the NiMMbus system. From then on, any other user of the portal can import it and use it. As the usage description is a full feedback item, the user creating the dynamic layer can also describe any other related information such as comments or advertise a related publication.

The system moves the focus to sharing user of the data and complements the producers documentation with the richness of the knowledge that user gain in their data driven research. In addition to augment GEOSS data the system enables a social network of knowledge.

How to cite: Zabala Torres, A., Masó Pau, J., and Pons, X.: Managing the knowledge created by the users trough Geospatial User Feedback system. The NEXTGEOSS use case, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18976, https://doi.org/10.5194/egusphere-egu2020-18976, 2020.

D881 |
Charlotte Pascoe, David Hassell, Martina Stockhause, and Mark Greenslade
The Earth System Documentation (ES-DOC) project aims to nurture an ecosystem of tools & services in support of Earth System documentation creation, analysis and dissemination. Such an ecosystem enables the scientific community to better understand and utilise Earth system model data.
The ES-DOC infrastructure for the Coupled Model Intercomparison Project Phase 6 (CMIP6) modelling groups to describe their climate models and make the documentation available on-line has been available for 18 months, and more recently the automatic generation of documentation of every published simulation has meant that every CMIP6 dataset within the Earth System Grid Federation (ESGF) is now immediately connected to the ES-DOC description of the entire workflow that created it, via a “further info URL”.
The further info URL is a landing page from which all of the relevant CMIP6 documentation relevant to the data may be accessed, including experimental design, model formulation and ensemble description, as well as providing links to the data citation information.
These DOI landing pages are part of the Citation Service, provided by DKRZ. Data citation information is also available independently through the ESGF Search portal or in the DataCite search or Google’s dataset search. It provides users of CMIP6 data with the formal citation that should accompany any use of the datasets that comprise their analysis.
ES-DOC services and the Citation Service form a CMIP6 project  collaboration, and depend upon structured documentation provided by the scientific community. Structured scientific metadata has an important role in science communication, however it’s creation and collation exacts a cost in time, energy and attention.  We discuss progress towards a balance between the ease of information collection and the complexity of our information handling structures.
CMIP6: https://pcmdi.llnl.gov/CMIP6/
ES-DOC: https://es-doc.org/
Further Info URL: https://es-doc.org/cmip6-ensembles-further-info-url

Citation Service: http://cmip6cite.wdc-climate.de

How to cite: Pascoe, C., Hassell, D., Stockhause, M., and Greenslade, M.: Advances in Collaborative Documentation Support for CMIP6, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19636, https://doi.org/10.5194/egusphere-egu2020-19636, 2020.

D882 |
Martin Schiegl, Gerold W. Diepolder, Abdelfettah Feliachi, José Román Hernández Manchado, Christine Hörfarter, Olov Johansson, Andreas-Alexander Maul, Marco Pantaloni, László Sőrés, and Rob van Ede

In geosciences, where nomenclature naturally has grown from regional approaches with limited cross-border harmonization, descriptive texts are often used for coding data whose meanings in the international context are not conclusively clarified. This leads to difficulties when cross border datasets are compiled. On one hand, this is caused by the national-language, regional and historical descriptions in geological map legends. On the other hand, it is related to the interdisciplinary orientation of the geosciences e.g. when concepts adopted from different areas have a different meaning. A consistent use and interpretation of data to international standards creates the potential for semantic interoperability. Datasets then fit into international data infrastructures. But what if the interpretation to international standards is not possible, because there is none, or existing standards are not applicable? Then efforts can be made to create machine-readable data using knowledge representations based on Semantic Web and Linked Data principles.

With making concepts reference able via uniform identifiers (HTTP URIs) and crosslinking them to other resources published in the web, Linked Data offers the necessary context for clarification of the meaning of concepts. This modern technology and approach ideally complements the mainstream GIS (Geographic Information System) and relational database technologies in making data findable and semantic interoperable.

GeoERA project (Establishing the European Geological Surveys Research Area to deliver a Geological Service for Europe, https://geoera.eu/) therefore provides the opportunity to clarify expert knowledge and terminology in the form of project specific vocabulary concepts on a scientific level and to use them in datasets to code data. At the same time, parts of this vocabulary might be later included in international standards (e.g. INSPIRE or GeoSciML), if desired. So called “GeoERA Project Vocabularies” are open collections of knowledge that, for example, may also contain deprecated, historical or only regionally relevant terms. In an ideal overall view, the sum of all vocabularies results in a knowledge database of bibliographically referenced terms that have been developed through scientific projects. Due to the consistent application of the data standards of Semantic Web and Linked Data nothing stands in the way of further use by modern technologies such as AI.

Project Vocabularies also could build an initial part of a future EGDI (European Geological Data Infrastructure, http://www.europe-geology.eu/) knowledge graph. They are restricted to linguistic labeled concepts, described in SKOS (Simple Knowledge Organization System) plus metadata properties with focus on scientific reusability.  In order to extend this knowledge graph, additionally they also could be supplemented by RDF data files to support project related applications and functionality.

How to cite: Schiegl, M., Diepolder, G. W., Feliachi, A., Hernández Manchado, J. R., Hörfarter, C., Johansson, O., Maul, A.-A., Pantaloni, M., Sőrés, L., and van Ede, R.: Semantic harmonization of geoscientific data sets using Linked Data and project specific vocabularies, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9412, https://doi.org/10.5194/egusphere-egu2020-9412, 2020.

D883 |
Rainer Haener, Henning Lorenz, Sylvain Grellet, Marc Urvois, and Eberhard Kunz

This study presents an approach on how to establish Conceptual Interoperability for autonomous, multidisciplinary systems participating in Research Infrastructures, Early Warning, or Risk Management Systems. Although promising implementations already exist, true interoperability is far from being achieved. Therefore, reference architectures and principles of Systems-of-Systems are adapted for a fully specified, yet implementation-independent Conceptual Model, establishing interoperability to the highest possible degree. The approach utilises use cases and requirements from geological information processing and modelling within the European Plate Observing System (EPOS).

Conceptual Interoperability can be accomplished by enabling Service Composability. Unlike integration, composability allows interactive data processing and beyond, evolving systems that enable interpretation and evaluation by any potential participant. Integrating data from different domains often leads to monolithic services that are implemented only for a specific purpose (Stovepipe System). Consequently, composability is essential for collaborative information processing, especially in modern interactive computing and exploration environments. A major design principle for achieving composability is Dependency Injection, allowing flexible combinations (Loose Coupling) of services that implement common, standardised interfaces (abstractions). Another decisive factor for establishing interoperability are Metamodels of data models that specify data and semantics regardless of their domain, based on a common, reusable approach. Thus, data from different domains can be represented by one common encoding that e.g. abstracts landslides (geophysical models) or buildings (urban planning) based on their geometry. An indispensable part of a Conceptual Model is detailed semantics, which not only requires terms from Domain-Controlled Vocabularies, but also ontologies providing qualified statements about the relationship between data and associated concepts. This is of major importance for evolutionary systems that are able to comprehend and react to state changes. Maximum interoperability also requires strict modularisation for a clear separation of semantics, metadata and the data itself.

Conceptual models for geological information that are governed by the described principles and their implementations are still far away. Moreover, a route to achieve such models is not straightforward. They span a multitude of communities and are far too complex for conventional implementation in project form. A first step could be applying modern design principles to new developments in the various scientific communities and join the results under a common stewardship like the Open Geospatial Consortium (OGC). Recently, a Metamodel has been developed within the OGC’s Borehole Interoperability Experiment (BoreholeIE); initiated and led by the French Geological Survey (BRGM). It combines the ISO standard (19148:2012 linear referencing) for localisation along borehole paths with the adaption of different encodings of borehole logs based on well-established OGC standards. Further developments aim at correlating borehole logs, geological or geotechnical surveys, and geoscientific models. Since results of surveys are often only available as non-schematised interpretations in text form, interoperability requires formal classifications, which can be derived from machine learning methods applied to the interpretations. As part of a Conceptual Model, such classifications can be used for an automated exchange of standard-conform borehole logs or to support the generation of expert opinions on soil investigations.

How to cite: Haener, R., Lorenz, H., Grellet, S., Urvois, M., and Kunz, E.: Towards an ontology based conceptual model, establishing maximum interoperability for interactive and distributed processing of geoscientific information, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10227, https://doi.org/10.5194/egusphere-egu2020-10227, 2020.

D884 |
John S. Hughes and Daniel J. Crichton

The PDS4 Information Model (IM) Version was released for use in December 2019. The ontology-based IM remains true to its foundational principles found in the Open Archive Information System (OAIS) Reference Model (ISO 14721) and the Metadata Registry (MDR) standard (ISO/IEC 11179). The standards generated from the IM have become the de-facto data archiving standards for the international planetary science community and have successfully scaled to meet the requirements of the diverse and evolving planetary science disciplines.

A key foundational principle is the use of a multi-level governance scheme that partitions the IM into semi-independent dictionaries. The governance scheme first partitions the IM vertically into three levels, the common, discipline, and project/mission levels. The IM is then partitioned horizontally across both discipline and project/mission levels into individual Local Data Dictionaries (LDDs).

The Common dictionary defines the classes used across the science disciplines such as product, collection, bundle, data formats, data types, and units of measurement. The dictionary resulted from a large collaborative effort involving domain experts across the community. An ontology modeling tool was used to enforce a modeling discipline, for configuration management, to ensure consistency and extensibility, and to enable interoperability. The Common dictionary encompasses the information categories defined in the OAIS RM, specifically data representation, provenance, fixity, identification, reference, and context. Over the last few years, the Common dictionary has remained relatively stable in spite of requirements levied by new missions, instruments, and more complex data types.

Since the release of the Common dictionary, the creation of a significant number of LDDs has proved the effectiveness of multi-level, steward-based governance. This scheme is allowing the IM to scale to meet the archival and interoperability demands of the evolving disciplines. In fact, an LDD development “cottage industry” has emerged that required improvements to the development processes and configuration management.  An LDD development tool now allows dictionary stewards to quickly produce specialized LDDs that are consistent with the Common dictionary.

The PDS4 Information Model is a world-class knowledge-base that governs the Planetary Science community's trusted digital repositories. This presentation will provide an overview of the model and additional information about its multi-level governance scheme including the topics of stewardship, configuration management, processes, and oversight.

How to cite: Hughes, J. S. and Crichton, D. J.: Information Model Governance for Diverse Disciplines, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10600, https://doi.org/10.5194/egusphere-egu2020-10600, 2020.

D885 |
| solicited
| Highlight
Jeff de La Beaujardiere

The geosciences are facing a Big Data problem, particularly in the areas of data Volume (huge observational datasets and numerical model outputs), Variety (large numbers of disparate datasets from multiple sources with inconsistent standards), and Velocity (need for rapid processing of continuous data streams). These challenges make it difficult to perform scientific research and to make decisions about serious environmental issues facing our planet. We need to enable science at the scale of our large, disparate, and continuous data.

One part of the solution relates to infrastructure, such as by making large datasets available in a shared environment co-located with computational resources so that we can bring the analysis code to the data instead of copying data. The other part relies on improvements in metadata, data models, semantics, and collaboration. Individual datasets must have comprehensive, accurate, and machine-readable metadata to enable assessment of their relevance to a specific problem. Multiple datasets must be mapped into an overarching data model rooted in the geographical and temporal attributes to enable us to seamlessly find and access data for the appropriate location and time. Semantic mapping is necessary to enable data from different disciplines to be brought to bear on the same problem. Progress in all these areas will require collaboration on technical methods, interoperability standards, and analysis software that bridges information communities -- collaboration driven by a willingness to make data usable by those outside of the original scientific discipline.

How to cite: de La Beaujardiere, J.: Enabling Science at Scale, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12079, https://doi.org/10.5194/egusphere-egu2020-12079, 2020.

D886 |
Fan Wu, Hong Gao, and Zhaoyuan Yu

A conceptual consensus, as well as a unified representation, on a certain geographic concept across multiple contexts, can be of great significance to the communication, retrieval, combination, and reuse of geographic information and knowledge. However, geographic concept is a rich synthesis of semantics, semiotics, quality (e.g., vagueness or approximation). The generation, representation calculation and application of a certain geographic concept, consequently, can be of great heterogeneity, especially considering different interests, domains, language, etc. In light of these semantic heterogeneity problems, to code core concepts uniquely can be a lighter alternative to tradition ontology-based method, the reason for which is numeric codes can be a symbolism of consensus on concept across domains and even languages. Consequently, this paper proposed a unified semantic model as well as an encoding framework for representation, reasoning, and computation of geographic concept based on geometric algebra (GA). In this method, a geographic concept can be represented as a collection of semantic elements, which can be further encoded based on its hierarchy structure, and all the semantic information of the concept can be preserved across the encoding process. On the basis of the encoding result, semantic information can be reasoned backward by some well-defined operators, semantic similarity can also be computed for information inference as well as semantic association retrieval. In the case study, the implementation of the proposed framework shows that this GA-based semantic encoding model of can be a promising method to the unified expression, reasoning, and calculation of geographic concepts, which, reasonably, can be further regarded as a prospect lighter alternative of the solution to semantic heterogeneity.

How to cite: Wu, F., Gao, H., and Yu, Z.: Dealing with Semantic Heterogeneity of Geographic Concepts: A Geometric Algebra-Based Encoding Method, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2117, https://doi.org/10.5194/egusphere-egu2020-2117, 2020.

D887 |
Data Management at CEDA
not presented
Kate Winfield

Sending data to a secure long-term archive is increasingly a necessity for science projects due to the funding body and publishing requirements. It is also good practice for long term scientific aims and to enable the preservation and re-use of valuable research data. The Centre for Environmental Data Analysis (CEDA) hosts a data archive holding vast atmospheric and earth observation data from sources including aircraft campaigns, satellites, pollution, automatic weather stations, climate models, etc. The CEDA archive currently holds 14 PB data, in over 250 millions of files, which makes it challenging to discover and access specific data. In order to manage this, it is necessary to use standard formats and descriptions about the data. This poster will explore best practice in data management in CEDA and show tools used to archive and share data.

How to cite: Winfield, K.: Data Management at CEDA, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2375, https://doi.org/10.5194/egusphere-egu2020-2375, 2020.

D888 |
Marc Urvois, Sylvain Grellet, Abdelfettah Feliachi, Henning Lorenz, Rainer Haener, Christian Brogaard Pedersen, Martin Hansen, Luca Guerrieri, Carlo Cipolloni, and Mary Carter

The European Plate Observing System (EPOS, www.epos-ip.org) is a multidisciplinary pan-European research infrastructure for solid Earth science. It integrates a series of domain-specific service hubs such as the Geological Information and Modelling Technical Core Service (TCS GIM) dedicated to access data, data products and services on European boreholes, geological and geohazards maps, mineral resources as well as a catalogue of 3D models. These are hosted by European Geological Surveys and national research organisations.

Even though interoperability implementation frameworks are well described and used (ISO, OGC, IUGS/CGI, INSPIRE …), it proved to be difficult for several data providers to deploy in the first place the required OGC services supporting the full semantic definition (OGC Complex Feature) to discover and view millions of geological entities. Instead, data are collected and exposed using a simpler yet standardised description (GeoSciML Lite & EarthResourceML Lite). Subsequently, the more complex data flows are deployed with the corresponding semantics.

This approach was applied to design and implement the European Borehole Index and associated web services (View-WMS and Discovery-WFS) and extended to 3D Models. TCS GIM exposes to EPOS Central Integrated Core Services infrastructure a metadata catalogue service, a series of “index services”, a codeList registry and a Linked Data resolver. These allow EPOS end users to search and locate boreholes, geological maps and features, 3D models, etc., based on the information held by the index services.

In addition to these services, TCS GIM focussed particularly on sharing European geological data using the Linked Data approach. Each instance is associated with a URI and points to other information resources also using URIs. The Linked Data principles ensure the best semantic description (e.g. URIs to shared codeList registries entries) and also enrich an initial “information seed” (e.g. a set of Borehole entries matching a search) with more contents (e.g. URIs to more Features or a more complex description). As a result, this pattern including Simple Feature and Linked Data has a positive effect on the IT architecture: interoperable services are simpler and faster to deploy and there is no need to harvest a full OGC Complex Feature dataset. This architecture is also more scalable and sustainable.

The European Geological Services codeList registries have been enriched with new vocabularies as part of the European Geoscience Registry. In compliance with the relevant European INSPIRE rules, this registry is now part of the INPIRE Register Federation, the central access point to the repository for vocabulary and resources. European Geoscience Registry is available for reuse and extension by other geoscientific projects.

During the EPOS project, this approach has been developed and implemented for the Borehole and Model data services. TCS GIM team provided feedback on INSPIRE through the Earth Science Cluster, contributed to the creation of the OGC GeoScience Domain Working Group in 2017, the launch of the OGC Borehole Interoperability Experiment in 2018, and proposed evolutions to the OGC GeoSciML and IUGS/CGI EarthResourceML standards.

How to cite: Urvois, M., Grellet, S., Feliachi, A., Lorenz, H., Haener, R., Brogaard Pedersen, C., Hansen, M., Guerrieri, L., Cipolloni, C., and Carter, M.: Open access to geological information and 3D modelling data sets in the European Plate Observing System platform (EPOS), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5131, https://doi.org/10.5194/egusphere-egu2020-5131, 2020.

D889 |
Hassan Babaie and Armita Davarpanah

We model the intermittent, non-linear interactions and feedback loops of the complex rare earth elements (REE) mineral system applying the self-organized criticality concept.  Our semantic knowledge model (REE_MinSys ontology) represents dynamic primary and secondary processes that occur over a wide range of spatial and temporal scales and produce the emergent REE deposits and their geometry, tonnage, and grade. These include the scale-invariant, out-of-equilibrium geodynamic and magmatic processes that lead to the formation of orthomagmatic (carbonatite, alkaline igneous rocks) and syn- and post-magmatic hydrothermal REE deposits. The ontology also represents the redistribution of the REE from these primary ores by metamorphic fluids and/or post-depositional surface and supergene processes in sedimentary basins, fluvial channels, coast areas, and/or regolith around or above them. The ontology applies concepts of the complex systems theory to represent the spatial and spatio-temporal elements of the REE mineral system such as source, driver, threshold barriers, trigger, avalanche, conduit, relaxation, critical point attractor, and self-organization for the alkaline igneous, Iron oxide (subcategory of IOCG), hydrothermal, marine placers, alluvial placers (including paleo-placers), phosphorite, laterite, and ion-adsorption clays REE deposits. The ontology is instantiated with diverse data drawn from globally-distributed types of well-studied small to giant REE deposits to build the REE_MinSys knowledge base. Users can query the data in the knowledge base to extract explicit and inferred facts in each type of REE deposit, for example by asking: “Which rare earth elements are in REE phosphate deposits?”; “Which rare earth elements are largely explored in REE placer deposits?”  Data from the knowledge base will be divided into training and testing sets after they are preprocessed and trends and data patterns are identified through data analytical procedures. The training and test datasets will be used to build models applying machine learning algorithms to predict potential REE deposits of different kinds in unexposed or covered areas.

How to cite: Babaie, H. and Davarpanah, A.: A classification and predictive model of the complex REE mineral system, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6324, https://doi.org/10.5194/egusphere-egu2020-6324, 2020.

D890 |
Robert Huber, Anusuriya Devaraju, Michael Diepenbroek, Uwe Schindler, Roland Koppe, Tina Dohna, Egor Gordeev, and Marianne Rehage

Pressing environmental and societal challenges demand the reuse of data on a much larger scale. Central to improvements on this front are approaches that support structured and detailed data descriptions of published data. In general, the reusability of scientific datasets such as measurements generated by instruments, observations collected in the field, and model simulation outputs, require information about the contexts through which they were produced. These contexts include the instrumentation, methods, and analysis software used. In current data curation practice, data providers often put a significant effort in capturing descriptive metadata about datasets. Nonetheless, metadata about instruments and methods provided by data authors are limited, and in most cases are unstructured.

The ‘Interoperability’ principle of FAIR emphasizes the importance of using formal vocabularies to enable machine-understandability of data and metadata, and establishing links between data and related research entities to provide their contextual information (e.g., devices and methods). To support FAIR data, PANGAEA is currently elaborating workflows to enrich instrument information of scientific datasets utilizing internal as well as third party services and ontologies and their identifiers. This abstract presents our ongoing development within the projects FREYA and FAIRsFAIR as follows:

  • Integrating the AWI O2A (Observations to Archives) framework and associated suite of tools within PANGAEA’s curatorial workflow as well as semi-automatized ingestion of observatory data.
  • Linking data with their observation sources (devices) by recording the persistent identifiers (PID) from the O2A sensor registry system (sensor.awi.de) as part of the PANGAEA  instrumentation database.
  • Enriching device and method descriptions of scientific data by annotating them with appropriate vocabularies such as the NERC device type and device vocabularies or scientific methodology classifications.

In our contribution we will also outline the challenges to be addressed in enabling FAIR vocabularies of instruments and methods. This includes questions regarding reliability and trustworthiness of third party ontologies and services. Further, challenges in content synchronisation across linked resources and implications on FAIRness levels of data sets such as dependencies on interlinked data sources and vocabularies.

We will show in how far adapting, harmonizing and controlling the used vocabularies, as well as identifier systems between data provider and data publisher, improves the findability and re-usability of datasets , while keeping the curational overhead a slow as possible. This use case is a valuable example of how improving interoperability through harmonization efforts, though initially problematic and labor intensive, can benefits to a multitude of stakeholders in the long run: data users, publishers, research institutes, and funders.

How to cite: Huber, R., Devaraju, A., Diepenbroek, M., Schindler, U., Koppe, R., Dohna, T., Gordeev, E., and Rehage, M.: Enabling Data Reuse Through Semantic Enrichment of Instrumentation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7058, https://doi.org/10.5194/egusphere-egu2020-7058, 2020.

D891 |
Alexander Götz, Johannes Munke, Mohamad Hayek, Hai Nguyen, Tobias Weber, Stephan Hachinger, and Jens Weismüller

LTDS ("Let the Data Sing") is a lightweight, microservice-based Research Data Management (RDM) architecture which augments previously isolated data stores ("data silos") with FAIR research data repositories. The core components of LTDS include a metadata store as well as dissemination services such as a landing page generator and an OAI-PMH server. As these core components were designed to be independent from one another, a central control system has been implemented, which handles data flows between components. LTDS is developed at LRZ (Leibniz Supercomputing Centre, Garching, Germany), with the aim of allowing researchers to make massive amounts of data (e.g. HPC simulation results) on different storage backends FAIR. Such data can often, owing to their size, not easily be transferred into conventional repositories. As a result, they remain "hidden", while only e.g. final results are published - a massive problem for reproducibility of simulation-based science. The LTDS architecture uses open-source and standardized components and follows best practices in FAIR data (and metadata) handling. We present our experience with our first three use cases: the Alpine Environmental Data Analysis Centre (AlpEnDAC) platform, the ClimEx dataset with 400TB of climate ensemble simulation data, and the Virtual Water Value (ViWA) hydrological model ensemble.

How to cite: Götz, A., Munke, J., Hayek, M., Nguyen, H., Weber, T., Hachinger, S., and Weismüller, J.: A Lightweight, Microservice-Based Research Data Management Architecture for Large Scale Environmental Datasets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7937, https://doi.org/10.5194/egusphere-egu2020-7937, 2020.

D892 |
Adam Leadbetter, Andrew Conway, Sarah Flynn, Tara Keena, Will Meaney, Elizabeth Tray, and Rob Thomas

The ability to access and search metadata for marine science data is a key requirement for answering fundamental principles of data management (making data Findable, Accessible, Interoperable and Reusable) and also in meeting domain-specific, community defined standards and legislative requirements placed on data publishers. Therefore, in the sphere of oceanographic data management, the need for a modular approach to data cataloguing which is designed to meet a number of requirements can be clearly seen. In this paper we describe a data cataloguing system developed at and in use at the Marine Institute, Ireland to meet the needs of legislative requirements including the European Spatial Data Infrastructure (INSPIRE) and the Marine Spatial Planning directive.

The data catalogue described here makes use of a metadata model focussed on oceanographic-domain. It comprises a number of key classes which will be described in detail in the paper, but which include:

  • Dataset - combine many different parameters, collected at multiple times and locations, using different instruments
  • Dataset Collection - provides a link between a Dataset Collection Activity and a Dataset, as well as linking to the Device(s) used to sample the environment for a given range of parameters. An example of a Dataset Collection may be the Conductivity-Temperature-Depth profiles taken on a research vessel survey allowing the individual sensors to be connected to the activity and the calibration of those sensors to be connected with the associated measurements. 
  • Dataset Collection Activity - a specialised dataset to cover such activities as research vessel cruises; or the deployments of  moored buoys at specific locations for given time periods
  • Platform - an entity from which observations may be made, such as a research vessel or a satellite
  • Programme - represents a formally recognized scientific effort receiving significant funding, requiring large scale coordination
  • Device - aimed at providing enough metadata for a given instance of an instrument to provide a skeleton SensorML record
  • Organisation - captures the details of research institutes, data holding centres, monitoring agencies, governmental and private organisations, that are in one way or another engaged in oceanographic and marine research activities, data & information management and/or data acquisition activities

The data model makes extensive use of controlled vocabularies to ensure both consistency and interoperability in the content of attribute fields for the Classes outlined above.

The data model has been implemented in a module for the Drupal open-source web content management system, and the paper will provide details of this application.

How to cite: Leadbetter, A., Conway, A., Flynn, S., Keena, T., Meaney, W., Tray, E., and Thomas, R.: A modular approach to cataloguing oceanographic data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9750, https://doi.org/10.5194/egusphere-egu2020-9750, 2020.

D893 |
Maggie Davis, Richard Cederwall, Giri Prakash, and Ranjeet Devarakonda


Atmospheric Radiation Measurement (ARM), a U.S. Department of Energy (DOE) scientific user facility, is a key geophysical data source for national and international climate research. Utilizing a standardized schema that has evolved since ARM inception in 1989, the ARM Data Center (ADC) processes over 1.8 petabytes of stored data across over 10,000 data products. Data sources include ARM-owned instruments, as well as field campaign datasets, Value Added Products, evaluation data to test new instrumentation or models, Principal Investigator data products, and external data products (e.g., NASA satellite data). In line with FAIR principles, a team of metadata experts classifies instruments and defines spatial and temporal metadata to ensure accessibility through the ARM Data Discovery. To enhance geophysical metadata collaboration across American and European organizations, this work will summarize processes and tools which enable the management of ARM data and metadata. For example, this presentation will highlight recent enhancements in-field campaign metadata workflows to handle the ongoing Multidisciplinary Drifting Observatory for the Study of Arctic Climate (MOSAiC) data. Other key elements of ARM data center include: the architecture of ARM data transfer and storage processes, evaluation of data quality, ARM consolidated databases. We will also discuss tools developed for identifying and recommending datastreams and enhanced DOI assignments for all data types to assist an interdisciplinary user base in selecting, obtaining, and using data as well as citing the appropriate data source for reproducible atmospheric and climate research.

How to cite: Davis, M., Cederwall, R., Prakash, G., and Devarakonda, R.: Modern Scientific Metadata Management: Atmospheric Radiation Measurement (ARM) Facility Data Center , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12281, https://doi.org/10.5194/egusphere-egu2020-12281, 2020.

D894 |
Enrico Boldrini, Paolo Mazzetti, Stefano Nativi, Mattia Santoro, Fabrizio Papeschi, Roberto Roncella, Massimiliano Olivieri, Fabio Bordini, and Silvano Pecora

The WMO Hydrological Observing System (WHOS) is a service-oriented System of Systems (SoS) linking hydrological data providers and users by enabling harmonized and real time discovery and access functionalities at global, regional, national and local scale. WHOS is being realized through a coordinated and collaborative effort amongst:

  • National Hydrological Services (NHS) willing to publish their data to the benefit of a larger audience,
  • Hydrologists, decision makers, app and portal authors willing to gain access to world-wide hydrological data,
  • ESSI-Lab of CNR-IIA responsible for the WHOS broker component: a software framework in charge of enabling interoperability amongst the distributed heterogeneous systems belonging to data providers (e.g. data publishing services) and data consumers (e.g. web portals, libraries and apps),
  • WMO Commission of Hydrology (CHy) providing guidance to WMO Member countries in operational hydrology, including capacity building, NHSs engagement and coordination of WHOS implementation.

In the last years two additional WMO regional programmes have been targeted to benefit from WHOS, operating as successful applications for others to follow:

  • Plata river basin,
  • Arctic-HYCOS.

Each programme operates with a “view” of the whole WHOS, a virtual subset composed only by the data sources that are relevant to its context.

WHOS-Plata is currently brokering data sources from the following countries:

  • Argentina (hydrological & meteorological data),
  • Bolivia (meteorological data; hydrological data expected in the near future),
  • Brazil (hydrological & meteorological data),
  • Paraguay (meteorological data; hydrological data in process),
  • Uruguay (hydrological & meteorological data).

WHOS-Arctic is currently brokering data sources from the following countries:

  • Canada (historical and real time data),
  • Denmark (historical data),
  • Finland (historical and real time data),
  • Iceland (historical and real time data),
  • Norway (historical and real time data),
  • Russian (historical and real time data),
  • United States (historical and real time data).

Each data source publishes its data online according to specific hydrological service protocols and/or APIs (e.g. CUAHSI HydroServer, USGS Water Services, FTP, SOAP, REST API, OData, WAF, OGC SOS, …). Each service protocol and API in turn implies support for a specific metadata and data model (e.g. WaterML, CSV, XML , JSON, USGS RDB, ZRXP, Observations & Measurements, …).

WHOS broker implements mediation and harmonization of all these heterogeneous standards, in order to seamlessly support discovery and access of all the available data to a growing set of data consumer systems (applications and libraries) without any implementation effort for them:

  • 52North Helgoland (through SOS v.2.0.0),
  • CUAHSI HydroDesktop (through CUAHSI WaterOneFlow),
  • National Water Institute of Argentina (INA) node.js WaterML client (through CUAHSI WaterOneFlow),
  • DAB JS API (through DAB REST API),
  • USGS GWIS JS API plotting library (through RDB service),
  • R scripts (through R WaterML library),
  • C# applications (through CUAHSI WaterOneFlow),
  • UCAR jOAI (through OAI-PMH/WIGOS metadata).

In particular, the support of WIGOS metadata standard provides a set of observational metadata elements for the effective interpretation of observational data internationally.

In addition to metadata and data model heterogeneity, WHOS needs to tackle also semantics heterogeneity. WHOS broker makes use of a hydrology ontology (made available as a SPARQL endpoint) to augment WHOS discovery capabilities (e.g. to obtain translation of a hydrology search parameter in multiple languages).

Technical documentation to exercise WHOS broker is already online available, while the official public launch with a dedicated WMO WHOS web portal is expected shortly.

How to cite: Boldrini, E., Mazzetti, P., Nativi, S., Santoro, M., Papeschi, F., Roncella, R., Olivieri, M., Bordini, F., and Pecora, S.: WMO Hydrological Observing System (WHOS) broker: implementation progress and outcomes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-14755, https://doi.org/10.5194/egusphere-egu2020-14755, 2020.

D895 |
Alizia Mantovani, Vincenzo Lombardo, and Fabrizio Piana

This contribution regards the encoding of an ontology for the GeologicStructure class. This is one of the sections of OntoGeonous, a bigger ontology for the geosciences principally devoted to the representation of the knowledge contained in the geological maps; the others regard the Geologic unit, Geomorphologic feature and Geologic event. OntoGeonous is developed by the University of Turin, Department of Computer Sciences, and the Institute of Geosciences and Earth Resources of the National Research Council of Italy (CNR-IGG).

The encoding of the knowledge is based on the definitions and hierarchical organization of the concepts proposed by the international standard: GeoScienceML directive(1) and INSPIRE Data Specification on Geology(2) drive the architecture at more general levels, while the broader/narrower representation by CGI vocabularies(3) provide the internal taxonomies of the specific sub-ontologies. 

The first release of OntoGeonous had a complete hierarchy for the GeologicUnit class, which is partly different from the organization of knowledge of the international standard, and taxonomies for GeologicStructure, GeologicEvent and GeomorphologicFeature. The encoding process of OntoGeonous is presented in Lombardo et al. (2018) and in the WikiGeo website(4), while a method of application to the geological maps is presented in Mantovani et al (2020).

This contribution shows how the international standard guided the encoding of the sub-ontology for the GeologicStructure and the innovations introduced in the general organization of OntoGeonous compared to the OntoGeonous first release.  The main differences come from the analysis of the UML schemata for the GeologicStructure subclasses(5): first, the presence of the FoldSystem class inspired the creation of more general class for the associations of features; second, the attempt to describe the NonDirectionalStructure class made us group all the remaining classes into a new class with opposite characteristics. Similar modification have been made all over the GeologicStructure ontology.

Our intent is to improve the formal description of geological knowledge in order to practically support the use of ontology-driven data model in the geological mapping task. 



Lombardo, V., Piana, F., Mimmo, D. (2018). Semantics–informed geological maps: Conceptual modelling and knowledge encoding. Computers & Geosciences. 116. 10.1016/j.cageo.2018.04.001. 


Mantovani, A., Lombardo, V., Piana, F. (2020). Ontology-driven representation of knowledge for geological maps. (Submitted)


(1) http://www.geosciml.org. 

(2) http://inspire.jrc.ec.europa.eu/documents/Data_Specifications/INSPIRE_DataSpecification_GE_v3.0.pdf 

(3) http://resource.geosciml.org/def/voc/

(4) https://www.di.unito.it/wikigeo/index.php?title=Pagina_principale

(5) http://www.geosciml.org/doc/geosciml/4.1/documentation/html/EARoot/EA1/EA1/EA4/EA4/EA356.htm


How to cite: Mantovani, A., Lombardo, V., and Piana, F.: OntoGeonous-GS: Implementation of an ontology for the geologic structures from the IUGS CGI and INSPIRE standards, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15226, https://doi.org/10.5194/egusphere-egu2020-15226, 2020.

D896 |
Alexandra Kokkinaki, Justin Buck, Emma Slater, Julie Collins, Raymond Cramer, and Louise Darroch

Ocean data are expensive to collect. Data reuse saves time and accelerates the pace of scientific discovery. For data to be re-usable the FAIR principles reassert the need for rich metadata and documentation that meet relevant community standards and provide information about provenance.

Approaches on sensor observations, are often inadequate at meeting FAIR; prescriptive with a limited set of attributes, while providing little or no provision for really important metadata about sensor observations later in the data lifecycle.

As part of the EU ENVRIplus project, our work aimed at capturing the delayed mode, data curation process taking place at the National Oceanography Centre’s British Oceanography Data Centre (BODC). Our solution uses Unique URIs, OGC SWE standards and controlled vocabularies, commencing from the submitted originators input and ending by the archived and published dataset. 

The BODC delayed mode process is an example of a physical system that is composed of several components like sensors and other computations processes such as an algorithm to compute salinity or absolute winds. All components are described in sensorML identified by unique URIs and associated with the relevant datastreams, which in turn are exposed on the web via ERDDAP using unique URIs.

In this paper we intend to share our experience in using OGC standards and ERDDAP to model the above mentioned process and publish the associated datasets in a unified way. The benefits attained, allow greater automation of data transferring, easy access to large volumes of data from a chosen sensor, more precise capturing of data provenance, standardization, and pave the way towards greater FAIRness of the sensor data and metadata, focusing on the delayed mode processing.

How to cite: Kokkinaki, A., Buck, J., Slater, E., Collins, J., Cramer, R., and Darroch, L.: Using standards to model delayed mode sensor processes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15466, https://doi.org/10.5194/egusphere-egu2020-15466, 2020.

D897 |
Louise Darroch, Juan Ward, Alexander Tate, and Justin Buck

More than 40% of the human population live within 100 km of the sea. Many of these communities intimately rely on the oceans for their food, climate and economy. However, the oceans are increasingly being adversely affected by human-driven activities such as climate change and pollution. Many targeted, marine monitoring programmes (e.g. GOSHIP, OceanSITES) and pioneering observing technologies (e.g. autonomous underwater vehicles, Argo floats) are being used to assess the impact humans are having on our oceans. Such activities and platforms are deployed, calibrated and serviced by state-of-the-art research ships, multimillion-pound floating laboratories which operate diverse arrays of high-powered, high-resolution sensors around-the-clock (e.g. sea-floor depth, weather, ocean current velocity and hydrography etc.). These sensors, coupled with event and environmental metadata provided by the ships logs and crew, are essential for understanding the wider context of the science they support, as well as directly contributing to crucial scientific understanding of the marine environment and key strategic policies (e.g. United Nation’s Sustainable Development Goal 14). However, despite their high scientific value and cost, these data streams are not routinely brought together from UK large research vessels in coordinated, reliable and accessible ways that are fundamental to ensuring user trust in the data and any products generated from the data.  

The National Oceanography Centre (NOC) and British Antarctic Survey (BAS) are currently working together to improve the integrity of the data management workflow from sensor systems to end-users across the UK National Environment Research Council (NERC) large research vessel fleet, making cost effective use of vessel time while improving the FAIRness of data from these sensor arrays. The solution is based upon an Application Programming Interface (API) framework with endpoints tailored towards different end-users such as scientists on-board the vessels as well as the public on land. Key features include: Sensor triage using real-time automated monitoring systems, assuring sensors are working correctly and only the best data are output; Standardised digital event logging systems allowing data quality issues to be identified and resolved quickly; Novel open-source, data transport formats that are embedded with well-structured metadata, common standards and provenance information (such as controlled vocabularies and persistent identifiers), reducing ambiguity and enhancing interoperability across platforms; An open-source data processing application that applies quality control to international standards (SAMOS, or IOOS Qartod); Digital notebooks that manage and capture processing applied to data putting data into context; Democratisation and brokering of data through open data APIs (e.g. ERDDAP, Sensor Web Enablement), allowing end-users to discover and access data, layer their own tools or generate products to meet their own needs; Unambiguous provenance that is maintained throughout the data management workflow using instrument persistent identifiers, part of the latest recommendations by the Research Data Alliance (RDA).  

Access to universally interoperable oceanic data, with known quality and provenance, will empower a broad range of stakeholder communities, creating opportunities for innovation and impact through data use, re-use and exploitation.

How to cite: Darroch, L., Ward, J., Tate, A., and Buck, J.: Continuous ocean monitoring from sensor arrays on the UK large research vessels, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18848, https://doi.org/10.5194/egusphere-egu2020-18848, 2020.

D898 |
Barbara Magagna, Gwenaelle Moncoiffe, Anusuriya Devaraju, Pier Luigi Buttigieg, Maria Stoica, and Sirko Schindler

In October 2019, a new working group (InteroperAble Descriptions of Observable Property Terminology or I-ADOPT WG1) officially launched its 18-month workplan under the auspices of the Research Data Alliance (RDA) co-led by ENVRI-FAIR2 project members. The goal of the group is to develop a community-wide, consensus framework for representing observable properties and facilitating semantic mapping between disjoint terminologies used for data annotation. The group has been active for over two years and comprises research communities, data centers, and research infrastructures from environmental sciences. The WG members have been heavily involved in developing or applying terminologies to semantically enrich the descriptions of measured, observed, derived, or computed environmental data. They all recognize the need to enhance interoperability between their efforts through the WG’s activities.

Ongoing activities of the WG include gathering user stories from research communities (Task 1), reviewing related terminologies and current annotation practices (Task 2) and - based on this - defining and iteratively refining requirements for a community-wide semantic interoperability framework (Task 3). Much like a generic blueprint, this framework will be a basis upon which terminology developers can formulate local design patterns while at the same time remaining globally aligned. This framework will assist interoperability between machine-actionable complex property descriptions observed across the environmental sciences, including Earth, space, and biodiversity science. The WG will seek to synthesize well-adopted but still disparate approaches into global best practice recommendations for improved alignment. Furthermore, the framework will help mediate between generic observation standards (O&M3, SSNO4, SensorML5, OBOE6, ..) and current community-led terminologies and annotation practices, fostering harmonized implementations of observable property descriptions. Altogether, the WG’s work will boost the Interoperability component of the FAIR principles (especially principle I3) by encouraging convergence and by enriching the terminologies with qualified references to other resources. We envisage that this will greatly enhance the global effectiveness and scope of tools operating across terminologies. The WG will thus strengthen existing collaborations and build new connections between terminology developers and providers, disciplinary experts, and representatives of scientific data user groups. 

In this presentation, we introduce the working group to the EGU community, and invite them to join our efforts. We report the methodology applied, the results from our first three tasks and the first deliverable, namely a catalog of domain-specific terminologies in use in environmental research, which will enable us to systematically compare existing resources for building the interoperability framework. 


How to cite: Magagna, B., Moncoiffe, G., Devaraju, A., Buttigieg, P. L., Stoica, M., and Schindler, S.: Towards an interoperability framework for observable property terminologies, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19895, https://doi.org/10.5194/egusphere-egu2020-19895, 2020.

D899 |
Jan Schulte, Laura Helene Zepner, Stephan Mäs, Simon Jirka, and Petra Sauer

Over the last few years, a broad range of open data portals has been set-up. The aim of these portals is to improve the discoverability of open data resources and to strengthen the re-use of data generated by public agencies as well as research activities.

Often, such open data portals offer an immense amount of different types of data that may be relevant for a user. Thus, in order to facilitate the efficient and user-friendly exploration of available data sets, it is essential to visualize the data as quickly and easily as possible. While the visualization of static data sets is already well covered, selecting appropriate visualization approaches for potentially highly-dynamic spatio-temporal data sets is often still a challenge.

Within our contribution, we will introduce a preliminary study conducted by the mVIZ project which is funded by the German Federal Ministry of Transport and Digital Infrastructure as part of the mFUND programm. This project introduces a methodology to support the selection and creation of user-friendly visualizations for data discoverable via the open data portals such as the mCLOUD. During this process, specific consideration are given to properties and metadata of the datasets as input for a decision workflow to suggest appropriate visualization types. A resulting guideline will describe the methodology and serve as a basis for the conception, extension or improvement of visualization tools or for their further development and integration into open data portals.

The project focuses particularly on the creation of an inventory of open spatiotemporal data in open data portals as well as an overview of available visualization and analysis tools, the development of a methodology for selecting appropriate visualizations for the spatio-temporal data, and the development of a demonstrator for supporting the visualization of selected data sets.

How to cite: Schulte, J., Zepner, L. H., Mäs, S., Jirka, S., and Sauer, P.: Supporting Users to Find Appropriate Visualizations of Spatio-Temporal Open Data Sets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20448, https://doi.org/10.5194/egusphere-egu2020-20448, 2020.

D900 |
Tara Keena, Adam Leadbetter, Andrew Conway, and Will Meaney

The ability to access and search metadata for marine science data is a key requirement for answering fundamental principles of data management (making data Findable, Accessible, Interoperable and Reusable) and also in meeting domain-specific, community defined standards and legislative requirements placed on data publishers. One of the foundations of effective data management is appropriate metadata cataloguing; the storing and publishing of descriptive metadata for end users to query online. However, with ocean observing systems constantly evolving and the number of autonomous platforms and sensors growing, the volume and variety of data is constantly increasing, therefore metadata catalogue volumes are also expanding. The ability for data catalogue infrastructures to scale with data growth is a necessity, without causing significant additional overhead, in terms of technical infrastructure and financial costs. 

To address some of these challenges, GitHub and Travis CI offers a potential solution for maintaining scalable data catalogues and hosting a variety of file types, all with minimal overhead costs.

GitHub is a repository hosting platform for version control and collaboration, and can be used with documents, computer code, or many file formats

GitHub Pages is a static website hosting service designed to host web pages directly from a GitHub repository

Travis CI is a hosted, distributed continuous integration service used to build and test projects hosted at GitHub 

GitHub supports the implementation of a data catalogue as it stores metadata records of different formats in an online repository which is openly accessible and version controlled. The base metadata of the data catalogue in the Marine Institute is ISO 19115/19139 based XML which is in compliance with the INSPIRE implementing rules for metadata. However, using Travis CI, hooks can be provided to build additional metadata records and formats from this base XML, which can also be hosted in the repository. These formats include:

DataCite metadata schema - allowing a completed data description entry to be exported in support of the minting of Digital Object Identifiers (DOI) for published data

Resource Description Framework (RDF) - as part of the semantic web and linked data

Ecological Metadata Language (EML) - for Global Biodiversity Information Facility (GBIF) – which is used to share information about where and when species have been recorded

Schema.org XML – which creates a structured data mark-up schema to increase search engine optimisation (SEO)

HTML - the standard mark-up language for web pages which can be used to represent the XML as a web pages for end users to view the catalogue online

 As well as hosting the various file types, GitHub Pages can also render the generated HTML pages as static web pages. This allows users to view and search the catalogue online via a generated static website. 

The functionality GitHub has to host and version control metadata files, and render them as web pages, allows for an easier and more transparent generation of an online data catalogue while catering for scalability, hosting and security.

How to cite: Keena, T., Leadbetter, A., Conway, A., and Meaney, W.: Scaling metadata catalogues with web-based software version control and integration systems , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21258, https://doi.org/10.5194/egusphere-egu2020-21258, 2020.

D901 |
Jovanka Gulicoska, Koushik Panda, and Hervé Caumont

OpenSearch is a de-facto standard specification and a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format.

Evolved through extensions within an international standards organisation, the Open Geospatial Consortium, OpenSearch has become a reference to make queries to a repository that contains Earth Observation information, to send and receive structured, standardized search requests and results, and to allow syndication of repositories. It is in this evolved form a shared API used by many applications, tools, portals and sites in the Earth sciences community. The OGC OpenSearch extensions that have been implemented for the NextGEOSS DataHub, following the OGC standards and validated to be fully compatible with the standard.

The OGC OpenSearch extensions implemented for CKAN, the open source software solution supporting the NextGEOSS Datahub, add the standardized metadata models and the OpenSearch API endpoints that allow the indexing of distributed EO data sources (currently over 110 data collections), and makes these available to client applications to perform queries and get the results. It allowed to develop a simple user interface as part of the NextGEOSS DataHub Portal, which implements the two-step search mechanism (leveraging data collections metadata and data products metadata) and translates the filtering done by users to an OpenSearch matching query. The user interface can render a general description document, that contains information about the collections available on the NextGEOSS DataHub, and then get a more detailed description document for each collection separately.

For generating the structure of the description documents and the result feed, we are using CKAN’s templates, and on top of that we are using additional files which are responsible for listing all available parameters and their options and perform validation on the query before executing. The search endpoint for getting the results feed, uses already existing CKANs API calls in order to perform the validation and get the filtered results taking into consideration the parameters of the user search.

The current NextGEOSS DataHub implementation therefore provides a user interface for users who are not familiar with Earth observation data collections and products, so they can easily create queries and access its results. Moreover, the NextGEOSS project partners are constantly adding additional data connectors and collecting new data sources that will become available through the OGC OpenSearch Extensions API. This will allow NextGEOSS to provide a variety of data for the users and accommodate their needs.


NextGEOSS is a H2020 Research and Development Project from the European Community under grant agreement 730329.

How to cite: Gulicoska, J., Panda, K., and Caumont, H.: OpenSearch API for Earth observation DataHub service, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21522, https://doi.org/10.5194/egusphere-egu2020-21522, 2020.

D902 |
Mario D'Amore, Andrea Naß, Martin Mühlbauer, Torsten Heinen, Mathias Boeck, Jörn Helbert, Torsten Riedlinger, Ralf Jaumann, and Guenter Strunz

For planetary sciences, the main archives to archived access to mission data are ESA's Planetary Science Archive (PSA) and the Planetary Data System (PSA) nodes in the USA. Along with recent and upcoming planetary missions the amount of different data (remote sensing/in-situ data, derived products) increases constantly and serves as basis for scientific research resulting in derived scientific data and information. Within missions to Mercury (BepiColombo), the Outer Solar System moons (JUICE), and asteroids (NASA`s DAWN), one way of scientific analysis, the systematic mapping of surfaces, has received new impulses, also in Europe. These systematic surface analyses are based on the numeric and visual comparison and combination of different remote sensing data sets, such as optical image data, spectral-/hyperspectral sensor data, radar images, and/or derived products like digital terrain models. The analyses mainly results in map figures, data, and profiles/diagrams, and serves for describing research investigations within scientific publications.

Handling these research products equivalently to missions´ base data in the main archives, web-based geographic information systems became a common mean to impart spatial knowledge to all kinds of possible users in the last years. So, further platforms and initiatives came up handling planetary data within web-based GIS, services, or/and virtual infrastructures. Those systems are either built upon proprietary software environments, but more common upon a well-established stack of open source software such as PostgreSQL, GeoServer (server for sharing geospatial data) and a graphical user interface based on JavaScript. Applicable standards developed by the Open Geospatial Consortium (OGC), such as the Web Map Service (WMS) and the Web Feature Service (WFS) server-based data storage as interface between the user interface and the server.


This contribution aims to a prototypical system for the structured storage and visualization of planetary data compiled and developed within or with the contribution of Institute for Planetary Research (PF, DLR). Consequently, it enables user groups to store and spatially explore research products centrally, sustainably across multiple missions and scientific disciplines [1].


Technically, the system is based on two components: 1) an infrastructure that provides data storage and management capabilities as well as OGC-compliant interfaces for collaborative and web-based data access services, such as the EOC Geoservice [2]. 2) UKIS (Environmental and Crisis Information Systems), a framework developed at DFD for the implementation of geoscientific web applications [3]. Substantially the prototype based on a recent approach developed within PF [4] where an existing database established at Planetary Spectroscopy Laboratory (PSL), handling different kind of spatial data, meets a vector-based data collection of thematic, mainly geologic and geomorphologic mapping results [5].


An information system of this kind is essential to ensure the efficient and sustainable utilization of the information already obtained and published. This is considered a prerequisite for guaranteeing a continuous and long-term use of scientific information and knowledge within institutional frameworks.


[1] Naß, et al (2019) EPSC #1311

[2] Dengler et al. (2013) PV 2013, elib.dlr.de/86351/

[3] Mühlbauer (2019) dlr.de/eoc/UKIS/en/

[4] Naß, d ’Amore, Helbert (2017) EPSC #646-1

[5] Naß, Dawn Science Team (2019) EPSC #1304

How to cite: D'Amore, M., Naß, A., Mühlbauer, M., Heinen, T., Boeck, M., Helbert, J., Riedlinger, T., Jaumann, R., and Strunz, G.: Research products across space missions – a prototype for central storage, visualization and usability, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21882, https://doi.org/10.5194/egusphere-egu2020-21882, 2020.

D903 |
Elizabeth Tray, Adam Leadbetter, Will Meaney, Andrew Conway, Caoimhín Kelly, Niall O’Maoileidigh, Elvira De Eyto, Siobhan Moran, and Deirdre Brophy

Scales and otoliths (ear stones) from fish are routinely sampled for age estimation and stock management purposes. Growth records from scales and otoliths can be used to generate long-term time series data, and in combination with environmental data, can reveal species specific population responses to a changing climate. Additionally, scale and otolith microchemical data can be utilized to investigate fish habitat usage and migration patters. A common problem associated with biological collections, is that while sample intake grows, long-term digital and physical storage is rarely a priority. Material is often collected to meet short-term objectives and resources are seldom committed to maintaining and archiving collections. As a consequence, precious samples are frequently stored in many different and unsuitable locations, and may become lost or separated from associated metadata. The Marine Institute’s ecological research station in Newport, Ireland, holds a multi-decadal (1928-2020) collection of scales and otoliths from various fish species, gathered from many geographic locations. Here we present an open-source database and archiving system to consolidate and digitize this collection, and show how this case study infrastructure could be used for other biological sample collections. The system utilizes the FAIR (Findable Accessible Interoperable and Reusable) open data principals, and includes a physical repository, sample metadata catalogue, and image library.

How to cite: Tray, E., Leadbetter, A., Meaney, W., Conway, A., Kelly, C., O’Maoileidigh, N., De Eyto, E., Moran, S., and Brophy, D.: An open-source database and collections management system for fish scale and otolith archives, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22021, https://doi.org/10.5194/egusphere-egu2020-22021, 2020.