ESSI3.1

Best Practices and Realities of Research Data Repositories: Balancing the needs of Repositories, Researchers and Publishers

As funders and publishers increasingly require that research data be made publicly available, research data repositories, especially in the Earth and environmental sciences, play a new and major role in the publication process. They are on one hand supporting researchers during the data publication process and, on the other hand, need to present the data in a way that they are fully integrable in the ecosystem of modern scientific communication as required by the FAIR Data Principles. More recent developments, like the CoreTrustSeal Certification and the Enabling FAIR Data Commitment Statement have defined additional benchmarks and expectations for the capabilities of repositories.

How do repositories comply with increasing expectations for machine accessibility of their data and the requirements for machine learning (particularly for long tail data)? How do researchers know which repositories meet these benchmarks and future expectations? How should publishers work together with repositories and researchers to ensure a more complete record of science? What role can data journals and editors play?

This session will showcase the range of practices in research data repositories, data publication and the integration of data, software, samples, models and notebooks into the scholarly publication process. It invites repositories, researchers, information scientists, journals, and editors to discuss challenges they are facing in meeting community best practice.

Convener: Kirsten Elger | Co-conveners: Lesley Wyborn, Kristin Vanderbilt, Amber Budden, Alice FremandECSECS
Presentations
| Fri, 27 May, 08:30–11:05 (CEST)
 
Room 0.31/32

Presentations: Fri, 27 May | Room 0.31/32

Chairpersons: Kirsten Elger, Kristin Vanderbilt, Lesley Wyborn
08:30–08:33
08:33–08:43
|
EGU22-12105
|
solicited
|
On-site presentation
Florian Haslinger, Jerry Carter, Helle Pedersen, Jonathan Schaeffer, Robert Casey, Javier Quinteros, and Angelo Strollo

Many geophysical data centers are being asked by their sponsors and funding agencies to provide information on what data and services are used by whom and for what purpose in greater detail than customary in the past, when bulk information about the number of users/accesses and volumes of download were deemed sufficient in most cases. Up to now, data centers generally offer anonymous access to large parts of their holdings, with different approaches to basic monitoring and access logging, e.g. by IP address, as a rough proxy, that allows one to infer geographical user distribution to some detail. 

Already today, access to embargoed or otherwise restricted data, or to advanced functions like personal work spaces and computational resources, is usually protected by user authentication and authorisation. Standardization of the identity management protocols is a requirement for further supporting the federation of data centers and their services, also in light of future integration with cloud services or other integrated services. For example in seismology, federated data retrieval systems follow a specific credential process based on standards for data exchange and web services established and maintained by the International Federation of Digital Seismograph Networks (FDSN). 

These new information requirements from funding agencies would, however, require implementing identity management systems and some sort of user identification / authentication to many or all data center services and resources. This is raising concerns within the data centers on a number of aspects: Evidence from other domains demonstrates that requiring authentication reduces the use of data center services; enforcing authentication is often perceived as being not in line with best practices for open science; implementing identity management for usage profiling may lead to significantly increased effort at the data centers, especially with regard to compliance with data protection legislation like GDPR, and it may significantly impede automated (scripted) machine-to-machine access; the level of detail that should be reported back to funding agencies is unclear and there are doubts whether detailed user profiling is a reasonable ‘performance indicator’. Indeed, such knowledge gathering on users needs to be obtained through technical implementations that take into account the impact on user experience, the impact on decades of research tool development, and the resources necessary to implement and operate such systems, whether embedded into the operational services or taking other forms such as surveys and outreach to user groups.

Relevant discussions have now started among representatives of major geophysical data centers so that interim plans can be shared, ideas and experiences exchanged, and standard approaches can be developed and recommended for consideration by the community. In these discussions we consider both scenarios where identity management is useful and relevant or where we may consolidate our views and arguments with respect to the general user data reporting requests.

How to cite: Haslinger, F., Carter, J., Pedersen, H., Schaeffer, J., Casey, R., Quinteros, J., and Strollo, A.: User Identification and Authentication for Geophysical Data Centers:  Exploring a Difficult Transition, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12105, https://doi.org/10.5194/egusphere-egu22-12105, 2022.

08:43–08:50
|
EGU22-10800
Shelley Stall and Chris Erdmann

As a researcher, having access to well-documented research datasets and software relevant to your work can vary in difficulty based on your discipline and other factors. When it works well, you benefit from the ability to easily analyze and perhaps use those data and software.  When it works poorly, you are sending emails to get access to datasets, asking for more information, hoping you will get responses, and then maybe trusting that you understand the data or software well enough to integrate, rework, or explore further. 

How do we create more of the “beneficial” experience?   How do we create a culture where having better tools, practices, and methods helps us achieve this goal?  Well, it takes deliberate intent and patience in taking those initial first steps.  Many of the difficult scientific questions still in front of us require access to more usable data and software, in easier ways, enabling us to “see” the complex systems of the universe better. 

In this talk, we will share the work happening in AGU, their collaborators, and the broader community to take those initial steps, and support the culture of the future.

How to cite: Stall, S. and Erdmann, C.: The Culture of Open Science: Building an Ethos that Feels like Home across the Earth, Space and Environmental Sciences -- One Step at a Time, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10800, https://doi.org/10.5194/egusphere-egu22-10800, 2022.

08:50–08:57
|
EGU22-10546
Christopher Erdmann and Shelley Stall

Data underlying published studies is difficult to find or access, which can hinder new scientific research. Currently, only about 20% of published papers have their supporting data in discoverable and accessible repositories. The AGU, working with our partners (Dryad, CHORUS, ESIP, Wiley), and supported by the National Science Foundation (NSF), will focus on improving guidance and workflows to properly manage, link, and track data and software references throughout the publication pipeline. The resulting best practices will serve as a resource for AGU editors, reviewers, and authors and help advance data and software publication policies. Beyond AGU, this work will serve as a model for linking information across funders, data repositories, and publishers, and improving public access to research outputs. In this talk, current publication practices as they relate to the FAIR principles will be described, together with lessons learned, and how workflows and guidance are being improved.

How to cite: Erdmann, C. and Stall, S.: Accelerating Open and FAIR Data Practices Across the Earth, Space, and Environmental Sciences, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10546, https://doi.org/10.5194/egusphere-egu22-10546, 2022.

08:57–09:04
|
EGU22-13354
|
ECS
|
On-site presentation
Melanie Lorenz, Kirsten Elger, Inke Achterberg, Marcel Meistring, Norbert Pfurr, and Malte Semmler

The change towards Open Science practices is increasingly demanded by science policy and affects the publication culture as well as the information infrastructures. This includes the transition to Open Access for journals and publishers (including the development of new business models), as well as the growing need to make data, scientific software and samples underlying scientific results available for the general public.  

The DFG-funded Specialised Information Service for Geoscience FID GEO (Fachinformationsdienst Geowissenschaften), aims at (1) reducing structural deficits in the area of electronic information and (2) promoting Open Science throughout the research life cycle. The service is hosted at the Göttingen State and University Library in Lower Saxony (SUB Göttingen) and the GFZ German Research Centre for Geosciences in Potsdam. The FID GEO team is made up of highly connected librarians, data publishing professionals, and geoscientists. Over the past five years, FID GEO has become a key player for the promotion of Open Science in the geosciences and occupies a central position for connecting researchers, data repositories, information infrastructures, German geosciences societies and publishers. FID GEO actively offers data and text publication services via their associated repositories GFZ Data Services and GEO-LEOe-docs as well as digitisation on demand of print-only geoscience literature and maps.

FID GEO aims at informing the geoscientific community about all aspects of Open Science on one hand, and is available for questions and support, e.g., during data publications, the transition to an Open Access model for journals of geosciences societies or getting a DOI for an article in the Green Open Access model. An online questionnaire in 2021 revealed that there is a high demand for information. This regards particularly topics such as licenses, persistent identifiers (ORCID, ROR, IGSN) and measures to ensure data quality and integrity in order to enable high quality, citable data publications. In the first funding phases of the project, workshops and talks have proven to be very successful tools to meet the large need for discussion, as they allow to directly address questions or uncertainties regarding practical aspects. Information events are prepared specifically for the individual target groups: researchers, German geosciences societies and members of infrastructural support units, like libraries (e.g., while societies are more interested in the development of guidelines, librarians have specific interest in licensing and copyright issues). To intensify the open information culture in the geosciences, FID GEO collaborates with strategic (inter)national initiatives, such as NFDI4Earth, COPDESS and OneGeochemistry.

How to cite: Lorenz, M., Elger, K., Achterberg, I., Meistring, M., Pfurr, N., and Semmler, M.: FID GEO: Promoting Open Science in the Geosciences, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13354, https://doi.org/10.5194/egusphere-egu22-13354, 2022.

09:04–09:11
|
EGU22-7291
|
ECS
Alice Fremand, Julien Bodart, Tom Jordan, Fausto Ferraccioli, Carl Robinson, Hugh Corr, Helen Peat, Robert Bingham, and David Vaughan

Over the past 50 years, the British Antarctic Survey (BAS) has been one of the major acquisitors of airborne geophysical data over Antarctica, providing scientists with gravity, magnetics and radar datasets that have been central to many studies of the past, present, and future evolution of the Antarctic Ice Sheet. Until recently, many of these datasets were unpublished in full, restricting the further usage of the data for different glaciological and geophysical applications. Starting in 2020, scientists and data managers at the British Antarctic Survey have worked on standardising and releasing large swaths of aerogeophysical data acquired during the period 1994-2020, including a total of 64 datasets from 24 different surveys, amounting to ~450,000 line-km (or 5.3 million km2) of data across West Antarctica, East Antarctica, and the Antarctic Peninsula. Amongst these are the extensive surveys over the fast-changing Pine Island (2004-05) and Thwaites (2018-20) glacier catchments amongst others. Considerable effort has been made to standardise these datasets to comply with the FAIR (Findable, Accessible, Interoperable and Re-Usable) data principles, as well as to create a new Polar Airborne Geophysics Data Portal (https://www.bas.ac.uk/project/nagdp/), which serves as a user-friendly interface to interact and download the newly published data. Here, we review how these datasets were acquired and processed, and present the methods used to standardise them. We then discuss the new data portal infrastructure and interactive tutorials that were created to improve the accessibility of the data. We believe that this newly released data will be a valuable asset to future geophysical and glaciological studies over Antarctica and extend significantly the life cycle of the data. All datasets included in this data release are now fully accessible at the UK Polar Data Centre, now certified by the CoreTrustSeal: https://data.bas.ac.uk

How to cite: Fremand, A., Bodart, J., Jordan, T., Ferraccioli, F., Robinson, C., Corr, H., Peat, H., Bingham, R., and Vaughan, D.: British Antarctic Survey’s Aerogeophysics Data: Releasing 25 Years of Gravity, Magnetics, and Radar Datasets over Antarctica , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7291, https://doi.org/10.5194/egusphere-egu22-7291, 2022.

09:11–09:18
|
EGU22-12563
|
Virtual presentation
Florian Ott, Kirsten Elger, and Simone Frenzel

GFZ Data Services, hosted at the GFZ German Research Centre for Geosciences (GFZ), is a domain repository for geosciences data that assigns digital object identifier (DOI) to data and scientific software since 2004 and is Allocating Agent for IGSN, the globally unique persistent identifier for physical samples, providing IGSN minting services for physical samples since 2012. The repository provides DOI minting services for several global monitoring networks/observatories in geodesy and geophysics (e.g. INTERMAGNET; IAG Services ICGEM, IGETS, IGS; GEOFON), collaborative projects (TERENO, EnMAP, GRACE, CHAMP) and the curation of long-tail data by domain specialists.

Provision of (1) comprehensive domain-specific data description via standardised and machine-readable metadata with controlled domain vocabularies, (2) complementing the metadata with comprehensive and standardised technical data descriptions or reports; and (3) embedding the research data in wider context by providing cross-references through Persistent Identifiers (DOI, IGSN, ORCID, Fundref) to related research products (text, data, software) and people or institutions involved are used by GFZ Data Services to increase the interoperability of long-tail data.

For their data and software publication activities, GFZ Data Services provides an XML metadata editor18 that assists scientists to create metadata in different international metadata schemas (ISO19115, DataCite), while being at the same time usable by and understandable for the scientists (Ulbricht et al., 2017, 2020). Together with the new website launch of GFZ Data Services in 2022 user guidance has increased significantly and the website has further developed from a searchable data portal (only) to an information point for data publications and data management. This includes information on metadata, data formats, the data publication workflow, FAQ, links to different versions of our metadata editor and downloadable data description templates.

How to cite: Ott, F., Elger, K., and Frenzel, S.: Tools and incentives for the curation of geosciences data. Experiences from GFZ Data Services, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12563, https://doi.org/10.5194/egusphere-egu22-12563, 2022.

09:18–09:25
|
EGU22-8427
|
ECS
Geertje ter Maat and Richard Wessels and the the EPOS TCS Multi-scale Laboratories Team

The Multi-scale Laboratories (MSL) are a network of European laboratories bringing together the scientific fields of analogue modeling, paleomagnetism, rock and melt physics, geochemistry and microscopy. MSL is one of nine Thematic Core Services (TCS) of the European Plate Observing System (EPOS) (https://www.epos-eu.org/). The overarching goal of EPOS is to establish a comprehensive multidisciplinary research platform for the Earth sciences in Europe. It aims at facilitating the integrated use of data, models, and facilities, from both existing and new distributed pan-European Research Infrastructures, allowing open access and transparent (re-)use of data.

Laboratory facilities are an integral part of Earth science research. The diversity of methods employed in such infrastructures reflects the multi-scale nature of the Earth system and is essential for understanding its evolution, assessing geo-hazards, and sustainably exploiting geo-resources.

Experimental data from these laboratories provide the backbone for many scientific publications, but are often available only on request from the author, as supplementary information to research articles or in a non-digital form (printed tables, figures), limiting data re-use, re-interpretation and availability. Moreover, the raw data remains often unpublished, inaccessible, and unpreserved for the long term.

The TCS MSL is committed to making Earth science laboratory data Findable, Accessible, Interoperable, and Reusable (FAIR). For this purpose, the TCS MSL encourages the community to share their data via DOI-referenced, citable data publications. To facilitate this and ensure the provision of rich metadata, we offer user-friendly tools, plus the necessary data management expertise, to support all aspects of data publishing for the benefit of individual lab researchers via partner repositories. Data published via TCS MSL are described with the use of sustainable metadata standards enriched with controlled vocabularies used in geosciences. The resulting data publications are also exposed through a designated TCS MSL online portal that brings together DOI-referenced data publications from partner research data repositories (https://epos-msl.uu.nl/). As such, successful efforts have already been made to interconnect new data (metadata exchange) with existing databases such as MagIC (paleomagnetic data in Earthref.org) and, in the future, we expect to broaden and improve this practice with other repositories. 

How to cite: ter Maat, G. and Wessels, R. and the the EPOS TCS Multi-scale Laboratories Team: How to publish your data with the EPOS Multi-scale Laboratories data publication chain, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8427, https://doi.org/10.5194/egusphere-egu22-8427, 2022.

09:25–09:32
|
EGU22-10321
Laurens Samshuijzen, Otto Lange, Ronald Pijnenburg, Kirsten Elger, Richard Wessels, Geertje ter Maat, Simone Frenzel, and Martyn Drury

The Thematic Core Service Multi-scale Laboratories (TCS MSL) is a community within the European Plate Observing System (EPOS) that includes a wide range of world-class laboratory infrastructures and that provides a cross-disciplinary, though coherent platform for virtual access to data and physical access to solid Earth science labs. The data produced at the participating laboratories are crucial to serving society’s need for geo-resources exploration and for protection against geo-hazards. To model resource formation and system behaviour during exploitation, researchers need an understanding from the molecular to the continental scale, based on experimental and analytical data.

Data coming from the MSL laboratories provide the backbone for scientific publications, but they are often available only as supplementary information to research articles. Moreover, the vast majority of  the collected data remain unpublished, inaccessible, and often not sustainably preserved for the long term. To allow reuse of these valuable but often neglected data, the TCS MSL developed a full publication chain to support their FAIR dissemination and sustainable accessibility. This chain consists of a community-driven metadata standard that allows multiple discipline-specific detailed descriptions, a publication tool (metadata editor), and an online community portal that gives access to DOI-referenced data publications at multiple research data repositories related to the TCS MSL context (https://epos-msl.uu.nl/). The portal is built on the CKAN repository toolkit and is driven by the richness of the TCS MSL metadata standard. Besides its importance for the TCS MSL community, it also provides a showcase of how to set up the CKAN environment as a cross-disciplinary catalogue for FAIR metadata exchange through a cascade of infrastructures.

How to cite: Samshuijzen, L., Lange, O., Pijnenburg, R., Elger, K., Wessels, R., ter Maat, G., Frenzel, S., and Drury, M.: Findability of laboratory data in the solid Earth sciences: a portal for cross-disciplinary metadata, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10321, https://doi.org/10.5194/egusphere-egu22-10321, 2022.

09:32–09:39
|
EGU22-12874
|
ECS
Bryant Ware, Alexander Prent, Samual Boone, Hayden Dalton, Guillaume Florin, Yoann Greau, Fabian Kohlmann, Moritz Theile, Wayne Noble, Erin Matchan, Barry Kohn, Andrew Gleadow, and Brent McInnes

One of the greatest challenges in the global geochemistry community is to aggregate and make the large amounts of geochemical data generated by laboratories FAIR [Findable, Accessible, Interoperable, and Reusable] and publicly available the large amounts of data generated in laboratories. Standardisation and data organisation has often been an individual or voluntary/uncoordinated effort and/or motivated by the likelihood of immediate/near-future publication. Along with the technical challenges of getting laboratory data into a well-structured relational database and linked to samples’ metadata, societal and cultural issues are often present around the standardisation and accessibility of data reporting (e.g. equipment manufacturer, funding body proprietary data outputs, data reduction software accessibility and requirements/“data ownership” of the users/scientists).

 

In response to a national expression of a need to address the challenges outlined above and for better organisation and coordination of Australian geochemistry laboratories and data, AuScope funded the AuScope Geochemistry Network (AGN) in 2019. The AGN comprises a team of researchers, data-scientists, and technical staff from three universities across Australia; Curtin University, the University of Melbourne, and Macquarie University, tasked in coordinating and strategizing the best approach to:

  • Unite the diverse Australian geochemistry community.
  • Promote national capability (existing geochemical capability).
  • Promote investment in infrastructure (new, advanced geochemical infrastructure).
  • Support increased end user access to laboratory facilities.
  • Support professional development via online tools, training courses and workshops.
  • Preserving legacy data sets

 

Over the last two years the AGN has worked to organise the geochemistry community and provide solutions to the integration and adoption of international best practices for data management. With the ‘end in mind’ the AGN and collaborator Lithodat have developed the AusGeochem platform, a unique research data platform that services laboratory needs, bridges the gap between sample metadata and analytical data as well as strengthens the user-laboratory connection. To establish data reporting tables that fit the community’s need, yet facilitate FAIR data practices and integrating international best practices for handling geochemistry data, the AGN led and coordinated Expert Advisory Groups composed of geochemical specialists from a number of Australian institutions. Along with the AusGeochem platform that allows laboratories to upload, archive, disseminate and publish their datasets; the AGN has built LabFinder, a web application tool that helps geoscience users find and access the right laboratory and analytical technique to solve their research questions. LabFinder aims to continue to support end user access to laboratory facilities leading to the improvement in the capability and capacity of geochemistry laboratories on a national scale. In the coming two years AGN will continue to build upon these accomplishments by expanding the AGN data partnerships through the on boarding of institutions hosting major geochemistry laboratories, further facilitating collaborations between Australian geochemistry laboratories.

How to cite: Ware, B., Prent, A., Boone, S., Dalton, H., Florin, G., Greau, Y., Kohlmann, F., Theile, M., Noble, W., Matchan, E., Kohn, B., Gleadow, A., and McInnes, B.: The AuScope Geochemistry Network: Facilitating Better Organisation, Coordination and Ability to Share Data Produced by Australian Geochemistry Laboratories, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12874, https://doi.org/10.5194/egusphere-egu22-12874, 2022.

09:39–09:46
|
EGU22-5985
Peter Löwe, Maris Nartišs, Jeff McKenna, and Astrid Emde

We report on the adoption of persistent identifiers by community-driven geospatial open source communities and their umbrella organisation OSGeo. After a ramp up process, which included the introduction and evaluation of Digital Object Identifiers (DOI) for OSGeo conference recordings, a growing number of OSGeo project communities have started to adopt DOI for their respective code bases (e.g. MOSS GIS https://doi.org/10.5281/zenodo.5825144, GRASS GIS https://doi.org/10.5281/zenodo.5810537), enabling scientific reference and citation for distinct versions of code and also the overall software project code base. In addition, the use of persistent IDs for persons is accelerating. This presentation provides an overview over the latest state of DOI use by projects, lessons learned, emerging challenges (e.g. DOI-based reference for software versions predating the DOI minting) and emerging opportunities for the OSGeo communities (e.g. integration of DOI for code, presentations, manuals and video) and beyond.

How to cite: Löwe, P., Nartišs, M., McKenna, J., and Emde, A.: The new emerging DOI biotope within and around the Open Source Geospatial Foundation (OSGeo), EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5985, https://doi.org/10.5194/egusphere-egu22-5985, 2022.

09:46–09:53
|
EGU22-6573
|
Virtual presentation
Rob Mellors and Chad Trabant and the DAS RCN Data Management Working Group

Distributed Acoustic Sensing (DAS) is a relatively recent technology that has the capability to collect seismic (and other data) time series data using optical fiber as sensors. These optical fibers may be custom deployments or re-purposed telecommunication fibers. The range of applications is increasing rapidly, and recent studies include subsurface monitoring, earthquake hazard, geotechnical engineering, and ice flow. As the number of uses and studies increase, it is expected that the need for archiving of the datasets will also increase. Archiving of DAS faces multiple challenges at present. These include the need for large amounts (100’s TB) of storage, associated data transport and processing, and a standardized metadata format. As part of the DAS Research Coordination Network (RCN), a DAS data management working group is constructing a metadata model for DAS data that will address these needs. The objective is to develop a common metadata standard for archival purposes and guide data collection at experiments. The metadata requirements include: 1) accommodation of most use cases (data collection scenarios); 2) permitting of cloud-based processing; 3) allowing of pre-processing; and 4) reduction of the burden of data transport. Standard metadata principles, such as findability, accessibility, interoperability, reusability (FAIR), and machine-readability, will be adhered to. The purpose of this presentation is to inform potential users of these efforts, encourage adoption of the proposed standard, and invite community input.

How to cite: Mellors, R. and Trabant, C. and the DAS RCN Data Management Working Group: DAS Data Management Challenges and Needs, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6573, https://doi.org/10.5194/egusphere-egu22-6573, 2022.

09:53–10:00
|
EGU22-4996
|
Virtual presentation
Anette Ganske, Angelika Heil, Hannes Thiemann, and Andrea Lammert

The FAIR1 data principles are important for the findability, accessibility, interoperability, and reusability of data. Therefore, many repositories make huge efforts to curate data so that they become FAIR and assign DataCite2 DOIs to archived data for increasing the findability. Nevertheless, recent investigations (Strecker3, 2021) show that many datasets published with a DataCite DOI don’t meet all aspects of the FAIR principles, as they are missing important information for the reuse and interoperability in their metadata. Further examinations of data from the Earth System Sciences (ESS) reveal that especially automatic processability is suboptimal, e.g. because of missing persistent identifiers for creators and affiliations, missing information about geolocations or time ranges of simulations. As well, many datasets either don’t have any licence information or a non-open licence.

The question arises of how datasets with open licences4 and high-quality metadata can be highlighted so that they stand out from the crowd of published data. One solution is the newly developed branding for FAIR and open ESS data, called EASYDAB5 (Earth System Data Branding).  It consists of a logo that earmarks landing pages of those datasets with a DataCite DOI, an open licence, open file formats6, rich metadata, and which were quality controlled by the responsible repository. The EASYDAB logo is protected and may only be used by repositories that agree to follow the EASYDAB Guidelines7. These guidelines define principles on how to achieve high metadata quality of ESS datasets by demanding specific metadata information. Domain-specific quality guidelines define the mandatory metadata for a self-explaining description of the data. One example is a quality guideline for atmospheric model data - the ATMODAT Standard8. It prescribes not only the metadata for the files but also for the DOI and the landing page. The atmodat data checker9 additionally helps data providers and repositories to check whether data files meet the requirements of the ATMODAT Standard.

The use of the EASYDAB logo is free of charge, but repositories must sign a contract with TIB – Leibniz Information Centre for Science and Technology10. TIB will control in the future that datasets with landing pages highlighted with EASYDAB indeed follow the EASYDAB Guidelines to ensure that EASYDAB remains a branding for high-quality data. 

Using EASYDAB, repositories can indicate their efforts to publish data with high-quality metadata. The EASYDAB logo also indicates to data users that the dataset is quality controlled and can be easily reused.

 

1: https://www.go-fair.org/fair-principles/

2: https://datacite.org/

3: https://edoc.hu-berlin.de/bitstream/handle/18452/23590/BHR470_Strecker.pdf?sequence=1

4: https://opendefinition.org/licenses/

5: https://www.easydab.de

6: http://opendatahandbook.org/guide/en/appendices/file-formats/

7: https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=EASYDAB_Guideline_v1.1

8: https://doi.org/10.35095/WDCC/atmodat_standard_en_v3_0

9: https://github.com/AtMoDat/atmodat_data_checker

10: https://www.tib.eu/en/

How to cite: Ganske, A., Heil, A., Thiemann, H., and Lammert, A.: Why we need to highlight FAIR and Open Data and how to do it with EASYDAB, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4996, https://doi.org/10.5194/egusphere-egu22-4996, 2022.

Coffee break
Chairpersons: Kirsten Elger, Lesley Wyborn
10:20–10:27
|
EGU22-11046
|
Virtual presentation
Angelika Heil, Andrea Lammert, and Anette Ganske

Atmospheric Models are a relevant element of Climate Research. Access to this atmospheric model data is not only of interest to a wide scientific community but also to public services, companies, politicians and citizens. The state-of-the-art approach to make the data available is to publish them via a data repository that adheres to the FAIR (Findable, Accessible, Interoperable, and Reusable) principles(1). A FAIR publication of research data implies that the data are described with rich metadata and that the data and metadata meet relevant discipline-specific standards. 

A very comprehensive data standard has been developed for the climate model output within the Climate Model Intercomparison Project (CMIP)(2), which largely builds upon the Climate and Forecast Metadata Conventions (CF Conventions)(3). Nevertheless, there are many areas of atmospheric modelling, where data standardisation according to the CMIP standard is not possible or very difficult. To facilitate this task, the ATMODAT standard(4), a quality guideline for the FAIR and open publication of atmospheric model data, was recently established. 

The ATMODAT standard defines a set of requirements that aim at ensuring a high degree of reusability of published atmospheric model data. The requirements include the use of the netCDF file format(5), the application of the CF conventions(3), a data publication with a DataCite DOI(6), and rich and standardised metadata for the data files, the DOI and on the landing page. 

The atmodat data checker(7) was developed to support data producers in checking if their file metadata comply with the ATMODAT standard. 

We demonstrate the application of the ATMODAT standard to selected datasets from a building-resolving atmospheric model, the "grassroots" AEROCOM MIP, and weather pattern data derived from an atmospheric reanalysis. We explain the practical workflow involved to achieve an ATMODAT-compliant data publication and discuss the various challenges.

 

(1) https://doi.org/10.1038/sdata.2016.18
(2) https://doi.org/10.5194/gmd-13-201-2020
(3) https://cfconventions.org/
(4) https://doi.org/10.35095/WDCC/atmodat_standard_en_v3_0
(5) https://www.unidata.ucar.edu/software/netcdf/
(6) https://datacite.org/
(7) https://github.com/AtMoDat/atmodat_data_checker

How to cite: Heil, A., Lammert, A., and Ganske, A.: How to Prepare Atmospheric Model Data for Publication with the ATMODAT Standard, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11046, https://doi.org/10.5194/egusphere-egu22-11046, 2022.

10:27–10:34
|
EGU22-11102
Matthes Rieke, Benedikt Gräler, and Simon Jirka

Data integration and harmonization has been a tedious task ever since. The increase of available data in volume and variety has further increased the need for a thorough data integration. Furthermore, the application of more and more automatic algorithms stresses the need for a sensible geo data platform to avoid the ‘garbage in, garbage out’ trap and to allow for a meaningful data analysis. We reviewed different projects and learned about various needs and constraints of joint spatial research data infrastructures from local to cloud based deployments. Typically, these systems are not designed from scratch and existing systems need to be integrated or interfaced. As a result or arising from the need to support the sovereignty of distributed data centers, modern infrastructures need to be capable to support federated set-ups. Often these research data infrastructures shall not only be used to store raw data for scientists, but will also provide results (maps, derived data products, tools and applications) to the public. This goes along with the need for access delegation (e.g. OAuth). A special focus is put on the provision of the joint datasets for machine learning applications. In order to facilitate efficient learning and prediction a ML processing environment needs to be aligned with the data infrastructure. 

We will present commonalities among these infrastructures and outline typical design patterns. A spatial data infrastructure based on open source software components that can be deployed on the cloud will be introduced. It features open standardized interfaces and services for easy adaptation and connectivity.

How to cite: Rieke, M., Gräler, B., and Jirka, S.: Joining Geo Data Across Different Providers to Ease Machine Learning Applications, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11102, https://doi.org/10.5194/egusphere-egu22-11102, 2022.

10:34–10:41
|
EGU22-9964
James Ayliffe, Martina Stockhause, Shelley Stall, Deb Agarwal, Justin Buck, Caroline Coward, and Chris Erdmann

In Earth and Biological sciences, data are often preserved and publicly available in data repositories where the data are citable by DOIs and published under a Creative Commons CC-BY license. Researchers combine many datasets across disciplines, repositories, and regions to better understand processes, patterns, and drivers. Citing these many datasets is difficult as the large number does not fit into the references section of a paper but the licenses of the datasets require that credit is given to their creators.

 

The Data Citation Community of Practice (CoP) was formed to target such challenges in data citation and other scholarly work that will support indexing and measuring the impact. The CoP identified a container as a solution for large numbers of data citations that holds the citations and its internal format, which is referred to as a 'reliquary'. The existing dataset collection methods have been gathered and evaluated using concrete citation use cases. Requirements for the reliquary content have been identified and applied to the use cases. In this presentation, we will report on the current progress on an approach to building a reliquary.

 

Reliquaries are an important part of enabling cross-disciplinary analysis of large amounts of data stored in many repositories. The challenge with a reliquary will be to design a method that works across diverse repositories and domain citation practices and to enhance the indexing system to direct credit to the reliquary content and authors. The CoP is in the process of setting up a Research Data Alliance (RDA) Working Group on Complex Citations in the Earth, Space, and Environmental Sciences to broaden the discussion and to find further use cases for evaluation and interested early adopters.

How to cite: Ayliffe, J., Stockhause, M., Stall, S., Agarwal, D., Buck, J., Coward, C., and Erdmann, C.: Citing large numbers of diverse datasets, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9964, https://doi.org/10.5194/egusphere-egu22-9964, 2022.

10:41–10:48
|
EGU22-10143
|
Virtual presentation
Formalization of experimental setups for integration of heterogeneous data
(withdrawn)
Artem Vladimirov, Taras Vasiliev, Alexander Pashkov, and Nadezda Vasilyeva
10:48–10:55
|
EGU22-12187
|
ECS
|
Virtual presentation
Linda Baldewein, Kirsten Elger, Birgit Heim, Alexander Brauser, Simone Frenzel, Ulrike Kleeberg, and Ben Norden

The International Geo Sample Number (IGSN) is a globally unique and persistent identifier (PID) for physical samples and collections with discovery function on the Internet. IGSNs enable to directly link data and publications with samples they originate from and thus close the last gap in the full provenance of research results. The modular IGSN metadata schema has a small number of mandatory and recommended metadata elements that can be individually extended with discipline-specific elements.

Based on three use cases that represent all states of digitisation - from individual scientists, collecting sample descriptions in their field books to digital sample management systems fed by an app that is used in the field - FAIR WISH will (1) develop standardised and discipline specific IGSN metadata schemes for different sample types from the Earth and Environment Sciences, (2) develop workflows to generate machine-readable IGSN metadata from different states of digitisation, (3) develop workflows to automatically register IGSNs and (4) prepare the resulting workflows for further use in the Earth Science community.

How to cite: Baldewein, L., Elger, K., Heim, B., Brauser, A., Frenzel, S., Kleeberg, U., and Norden, B.: FAIR WISH - FAIR Workflows to establish IGSN for Samples in the Helmholtz Association, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12187, https://doi.org/10.5194/egusphere-egu22-12187, 2022.

10:55–11:05