Displays

ESSI3.2

The evolving Open and FAIR ecosystem for Solid Earth and Environmental sciences: challenges, opportunities, and other adventures

Digital data, software and samples are key inputs that underpin research and ultimately scholarly publications, and there are increasing expectations from policy makers and funders that they will be Open and FAIR (Findable, Accessible, Interoperable, Reusable). Open, accessible, high-quality data, software and samples are critical to ensure the integrity of published research and to facilitate reuse of these inputs in future scientific efforts. In Europe, adherence to the INSPIRE directive becomes gradually more enforced by national legislations, affecting also the data lifecycle in Earth and Environmental Sciences.

These issues and challenges get addressed at increasing pace today, with journals changing their policies towards openness of data and software connected to publications, and national, European and global initiatives and institutions developing more and more services around Open and FAIR data, covering curation, distribution and processing. Yet, researchers, as producers as well as users of data, products, and software, continue to struggle with the requirements and conditions they encounter in this evolving environment.

An inclusive, integrated approach to Open and FAIR is required, with consistent policies, standards and guidelines covering the whole research data lifecycle, addressing also basic legal frameworks e.g. for intellectual property and licensing. At the same time, the research community needs to further develop a common understanding of best practices and appropriate scientific conduct adequate for this new era, and could still better share tools and techniques.

This session solicits papers from researchers, repositories, publishers, funders, policy makers and anyone having a story to share on and further evolution of an integrated, Open and FAIR research ecosystem.

Share:

Co-sponsored by AGU

Convener: Florian Haslinger | Co-conveners: Helen Glaves, Shelley Stall, Lesley Wyborn

Displays

| Attendance Tue, 05 May, 08:30–10:15 (CEST)

Files for download

Download all presentations (104MB)

Chat time: Tuesday, 5 May 2020, 08:30–10:15

D881 |

EGU2020-10349

| Highlight

The Role of Data Systems to Enable Open Science

Rahul Ramachandran, Kaylin Bugbee, and Kevin Murphy

Open science is a concept that represents a fundamental change in scientific culture. This change is characterized by openness, where research objects and results are shared as soon as possible, and connectivity to a wider audience. Understanding about what Open Science actually means differs from various stakeholders.

Thoughts on Open Science fall into four distinct viewpoints. The first viewpoint strives to make science accessible to a larger community by focusing on allowing non-scientists to participate in the research process through citizen science project and by more effectively communicating research results to the broader public. The second viewpoint considers providing equitable knowledge access to everyone by not only considering access to journal publications but also to other objects in the research process such as data and code. The third viewpoint focuses on making both the research process and the communication of results more efficient. There are two aspects to this component which can be described as social and technical components. The social component is driven by the need to tackle complex problems that require collaboration and a team approach to science while the technical component focuses on creating tools, services and especially scientific platforms to make the scientific process more efficient. Lastly, the fourth viewpoint strives to develop new metrics to measure scientific contributions that go beyond the current metrics derived solely from scientific publications and to consider contributions from other research objects such as data, code or knowledge sharing through blogs and other social media communication mechanisms.

Technological change is a factor in all four of these viewpoints on Open Science. New capabilities in compute, storage, methodologies, publication and sharing enable technologists to better serve as the primary drivers for Open Science by providing more efficient technological solutions. Sharing knowledge, information and other research objects such as data and code has become easier with new modalities of sharing available to researchers. In addition, technology is enabling the democratization of science at two levels. First, researchers are no longer constrained by lack of infrastructure resources needed to tackle difficult problems. Second, the Citizen Science projects now involve the public at different steps of the scientific process from collecting the data to analysis.

This presentations investigates the four described viewpoints on Open Science from the perspective of any large organization involved in scientific data stewardship and management. The presentation will list possible technological strategies that organizations may adopt to further align with all aspects of the Open Science movement.

How to cite: Ramachandran, R., Bugbee, K., and Murphy, K.: The Role of Data Systems to Enable Open Science, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10349, https://doi.org/10.5194/egusphere-egu2020-10349, 2020.

D882 |

EGU2020-13291

Building the Foundations for Open Applied Earth System Science in ENVRI-FAIR

Ari Asmi, Daniela Franz, and Andreas Petzold

The EU project ENVRI-FAIR builds on the Environmental Research Infrastructure (ENVRI) community that includes principal European producers and providers of environmental research data and research services. The ENVRI community integrates the four subdomains of the Earth system - Atmosphere, Ocean, Solid Earth, and Biodiversity/Terrestrial Ecosystems. The environmental research infrastructures (RI) contributing to ENVRI-FAIR have developed comprehensive expertise in their fields of research, but their integration across the boundaries of applied subdomain science is still not fully developed. However, this integration is critical for improving our current understanding of the major challenges to our planet such as climate change and its impacts on the whole Earth system, our ability to respond and predict natural hazards, and our understanding and preventing of ecosystem loss.

ENVRI-FAIR targets the development and implementation of the technical framework and policy solutions to make subdomain boundaries irrelevant for environmental scientists, and prepare Earth system science for the new paradigm of Open Science. Harmonization and standardization activities across disciplines together with the implementation of joint data management and access structures at RI level facilitate the strategic coordination of observation systems required for truly interdisciplinary science. ENVRI-FAIR will finally create an open access hub for environmental data and services provided by the contributing environmental RIs, utilizing the European Open Science Cloud (EOSC) as Europe´s answer to the transition to Open Science.

How to cite: Asmi, A., Franz, D., and Petzold, A.: Building the Foundations for Open Applied Earth System Science in ENVRI-FAIR, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13291, https://doi.org/10.5194/egusphere-egu2020-13291, 2020.

D883 |

EGU2020-18570

Sustainable FAIR Data management is challenging for RIs and it is challenging to solid Earth scientists

Massimo Cocco, Daniele Bailo, Keith G. Jeffery, Rossana Paciello, Valerio Vinciarelli, and Carmela Freda

Interoperability has long been an objective for research infrastructures dealing with research data to foster open access and open science. More recently, FAIR principles (Findability, Accessibility, Interoperability and Reusability) have been proposed. The FAIR principles are now reference criteria for promoting and evaluating openness of scientific data. FAIRness is considered a necessary target for research infrastructures in different scientific domains at European and global level.

Solid Earth RIs have long been committed to engage scientific communities involved in data collection, standardization and quality management as well as providing metadata and services for qualification, storage and accessibility. They are working to adopt FAIR principles, thus addressing the onerous task of turning these principles into practices. To make FAIR principles a reality in terms of service provision for data stewardship, some RI implementers in EPOS have proposed a FAIR-adoption process leveraging a four stage roadmap that reorganizes FAIR principles to better fit to scientists and RI implementers mindset. The roadmap considers FAIR principles as requirements in the software development life cycle, and reorganizes them into data, metadata, access services and use services. Both the implementation and the assessment of “FAIRness” level by means of questionnaire and metrics is made simple and closer to day-to-day scientists works.

FAIR data and service management is demanding, requiring resources and skills and more importantly it needs sustainable IT resources. For this reason, FAIR data management is challenging for many Research Infrastructures and data providers turning FAIR principles into reality through viable and sustainable practices. FAIR data management also includes implementing services to access data as well as to visualize, process, analyse and model them for generating new scientific products and discoveries.

FAIR data management is challenging to Earth scientists because it depends on their perception of finding, accessing and using data and scientific products: in other words, the perception of data sharing. The sustainability of FAIR data and service management is not limited to financial sustainability and funding; rather, it also includes legal, governance and technical issues that concern the scientific communities.

In this contribution, we present and discuss some of the main challenges that need to be urgently tackled in order to run and operate FAIR data services in the long-term, as also envisaged by the European Open Science Cloud initiative: a) sustainability of the IT solutions and resources to support practices for FAIR data management (i.e., PID usage and preservation, including costs for operating the associated IT services); b) re-usability, which on one hand requires clear and tested methods to manage heterogeneous metadata and provenance, while on the other hand can be considered a frontier research field; c) FAIR services provision, which presents many open questions related to the application of FAIR principles to services for data stewardship, and to services for the creation of data products taking in input FAIR raw data, for which is not clear how FAIRness compliancy of data products can be still guaranteed.

How to cite: Cocco, M., Bailo, D., Jeffery, K. G., Paciello, R., Vinciarelli, V., and Freda, C.: Sustainable FAIR Data management is challenging for RIs and it is challenging to solid Earth scientists, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18570, https://doi.org/10.5194/egusphere-egu2020-18570, 2020.

D884 |

EGU2020-13475

Status and challenges of FAIR data principles for a long-term repository

Chad Trabant, Rick Benson, Rob Casey, Gillian Sharer, and Jerry Carter

The data center of the National Science Foundation’s Seismological Facility for the Advancement of Geoscience (SAGE), operated by IRIS Data Services, has evolved over the past 30 years to address the data accessibility needs of the scientific research community. In recent years a broad call for adherence to FAIR data principles has prompted repositories to increased activity to support them. As these principles are well aligned with the needs of data users, many of the FAIR principles are already supported and actively promoted by IRIS. Standardized metadata and data identifiers support findability. Open and standardized web services enable a high degree of accessibility. Interoperability is ensured by offering data in a combination of rich, domain-specific formats in addition to simple, text-based formats. The use of open, rich (domain-specific) format standards enables a high degree of reuse. Further advancement towards these principles includes: an introduction and dissemination of DOIs for data; and an introduction of Linked Data support, via JSON-LD, allowing scientific data brokers, catalogers and generic search systems to discover data. Naturally, some challenges remain such as: the granularity and mechanisms needed for persistent IDs for data; the reality that metadata is updated with corrections (having implications for FAIR data principles); and the complexity of data licensing in a repository with data contributed from individual PIs, national observatories, and international collaborations. In summary, IRIS Data Services is well along the path of adherence of FAIR data principles with more work to do. We will present the current status of these efforts and describe the key challenges that remain.

How to cite: Trabant, C., Benson, R., Casey, R., Sharer, G., and Carter, J.: Status and challenges of FAIR data principles for a long-term repository, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13475, https://doi.org/10.5194/egusphere-egu2020-13475, 2020.

D885 |

EGU2020-4169

| Highlight

Practical data sharing with tangible rewards through publication in ESSD

David Carlson, Kirsten Elger, Jens Klump, Ge Peng, and Johannes Wagner

Envisioned as one solution to data challenges of the International Polar Year (2007-2008), the Copernicus data journal Earth System Science Data (ESSD) has developed into a useful rewarding data-sharing option for an unprecedented array of researchers. ESSD has published peer-reviewed descriptions of more than 500 easily- and freely-accessible data products, from more than 4000 data providers archiving their products at more than 100 data centres. ESSD processes and products provide a useful step toward Findable, Accessible, Interoperable, Reusable (FAIR) expectations but also a caution about implementation.

For ESSD, findable and accessible derive from the journal’s consistent mandate for open access coupled with useful title, author, abstract and full-text search functions on the publisher’s website (which lead users quickly to data sources) and excellent (but varied) topical, geographic, textual and chronologic search functions of host data centres. Due to an intense focus on data reliability and reusability during peer review of data descriptions, ESSD-referenced data products achieve very high standards of accessibility and reusability. ESSD experience over an amazing variety of data products suggests that ‘interoperability’ depends on the intended use of the data and experience of users. Many ESSD-published products adopt a shared grid format compatible with climate models. Other ESSD products, for example in ocean biogeochemistry or land agricultural cultivation, adopt or even declare interoperable terminologies and new standards for expression of uncertainty. Very often an ESSD publication explicitly describes data collections intended to enhance interoperability within a specific user community, through a new database for example. For a journal that prides itself on diversity and quality of its products published in service to a very broad array of oceanographic, terrestrial, atmospheric, cryospheric and global research communities, the concept of interoperability remains elusive.

Implementing open access to data has proven difficult. FAIR principles give us guidelines on the technical implementation of open data. However, ESSD’s experience (involving publisher, data providers, reviewers and data centres) in achieving very high impact factors (we consider these metrics as indicators of use and reuse of via ESSD published data products) can serve as a guide to the pursuit of the FAIR principles. For most researchers, data handling remains confusing and unrewarding. Data centres vary widely in capability, resources and approaches; even the ‘best’ (busiest) may change policies or practices according to internal needs independent of external standards or may - unexpectedly - go out of service. Software and computation resources grow and change rapidly, with simultaneous advances in open and proprietary tools. National mandates often conflict with international standards. Although we contend that ESSD represents one sterling example of promoting findable, accessible, interoperable and reusable data of high quality, we caution that those objectives remain a nebulous goal for any institution - in our case a data journal - whose measure of success remains a useful service to a broad research community.

How to cite: Carlson, D., Elger, K., Klump, J., Peng, G., and Wagner, J.: Practical data sharing with tangible rewards through publication in ESSD, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4169, https://doi.org/10.5194/egusphere-egu2020-4169, 2020.

Discussion

D886 |

EGU2020-8463

AtMoDat: Improving the reusability of ATmospheric MOdel DATa with DataCite DOIs paving the path towards FAIR data

Daniel Neumann, Anette Ganske, Vivien Voss, Angelina Kraft, Heinke Höck, Karsten Peters, Johannes Quaas, Heinke Schluenzen, and Hannes Thiemann

The generation of high quality research data is expensive. The FAIR principles were established to foster the reuse of such data for the benefit of the scientific community and beyond. Publishing research data with metadata and DataCite DOIs in public repositories makes them findable and accessible (FA of FAIR). However, DOIs and basic metadata do not guarantee the data are actually reusable without discipline-specific knowledge: if data are saved in proprietary or undocumented file formats, if detailed discipline-specific metadata are missing and if quality information on the data and metadata are not provided. In this contribution, we present ongoing work in the AtMoDat project, -a consortium of atmospheric scientists and infrastructure providers, which aims on improving the reusability of atmospheric model data.

Consistent standards are necessary to simplify the reuse of research data. Although standardization of file structure and metadata is well established for some subdomains of the earth system modeling community – e.g. CMIP –, several other subdomains are lacking such standardization. Hence, scientists from the Universities of Hamburg and Leipzig and infrastructure operators cooperate in the AtMoDat project in order to advance standardization for model output files in specific subdomains of the atmospheric modeling community. Starting from the demanding CMIP6 standard, the aim is to establish an easy-to-use standard that is at least compliant with the Climate and Forecast (CF) conventions. In parallel, an existing netCDF file convention checker is extended to check for the new standards. This enhanced checker is designed to support the creation of compliant files and thus lower the hurdle for data producers to comply with the new standard. The transfer of this approach to further sub-disciplines of the earth system modeling community will be supported by a best-practice guide and other documentation. A showcase of a standard for the urban atmospheric modeling community will be presented in this session. The standard is based on CF Conventions and adapts several global attributes and controlled vocabularies from the well-established CMIP6 standard.

Additionally, the AtMoDat project aims on introducing a generic quality indicator into the DataCite metadata schema to foster further reuse of data. This quality indicator should require a discipline-specific implementation of a quality standard linked to the indicator. We will present the concept of the generic quality indicator in general and in the context of urban atmospheric modeling data.

How to cite: Neumann, D., Ganske, A., Voss, V., Kraft, A., Höck, H., Peters, K., Quaas, J., Schluenzen, H., and Thiemann, H.: AtMoDat: Improving the reusability of ATmospheric MOdel DATa with DataCite DOIs paving the path towards FAIR data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8463, https://doi.org/10.5194/egusphere-egu2020-8463, 2020.

Discussion

D887 |

EGU2020-13071

The Dark Side of the Knowledge Graph - How Can We Make Knowledge Graphs Trustworthy?

Robert Huber and Jens Klump

“We kill people based on metadata.” (Gen. Michael V. Hayden, 2014) [1]

Over the past fifteen years, a number of persistent identifier (PID) systems have been built to help identify the stakeholders and their outputs in the research process and scholarly communication. Transparency is a fundamental principle of science, but this principle of transparency can be in conflict with the principles of the right to privacy. The development of Knowledge Graphs (KG), however, introduces completely new, and possibly unintended uses of publication metadata that require critical discussion. In particular, when personal data, as is linked with ORCID identifiers, are used and linked with research artefacts and personal information, KGs allow identifying personal as well as collaborative networks of individuals. This ability to analyse KGs may be used in a harmful way. It is a sad fact that in some countries, personal relationships or research in certain subject areas can lead to discrimination, persecution or prison. We must, therefore, become aware of the risks and responsibilities that come with networked data in KGs.

The trustworthiness of PID systems and KGs has so far been discussed in technical and organisational terms. The inclusion of personal data requires a new definition of ‘trust’ in the context of PID systems and Knowledge Graphs which should also include ethical aspects and consider the principles of the General Data Protection Regulation.

New, trustworthy technological approaches are required to ensure proper maintenance of privacy. As a prerequisite, the level of interoperability between PID needs to be enhanced. Further, new methods and protocols need to be defined which enable secure and prompt cascading update or delete actions of personal data between PID systems as well as knowledge graphs.

Finally, new trustworthiness criteria must be defined which allow the identification of trusted clients for the exchange of personal data instead of the currently practised open data policy which can be in conflict with legislation protecting privacy and personal data.

[1] https://www.nybooks.com/daily/2014/05/10/we-kill-people-based-metadata/

How to cite: Huber, R. and Klump, J.: The Dark Side of the Knowledge Graph - How Can We Make Knowledge Graphs Trustworthy?, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13071, https://doi.org/10.5194/egusphere-egu2020-13071, 2020.

D888 |

EGU2020-9207

FAIR access to soil and agricultural research data: The BonaRes Data Repository

Carsten Hoffmann, Xenia Specka, Nikolai Svoboda, and Uwe Heinrich

In the frame of the joint research project BonaRes (“Soil as a sustainable resource for the bioeconomy”, bonares.de) a data repository was set-up to upload, manage, and provide soil-, agricultural- and accompanying environmental research data. Research data are stored consistent and based on open and widely used standards within the repository over the long-term. Data visibility as well as its accessibility, reusability, and interoperability with international data infrastructures is fostered by rich description with standardized metadata and DOI allocation.

The specially developed metadata schema combines all elements from DataCite and INSPIRE. Metadata are entered by an online metadata editor and include thesauri (AGROVOC, GEMET), use licenses (Creative Commons: CC-BY for research data, CC-0 for metadata), lineage elements and data access points (geodata portal with OGC services). The repository meets thus the needs of the FAIR principles for research data.

In this paper we present and discuss functionalities and elements of the BonaRes Data Repository, show a typical data workflow from data owner to data (re-)user, demonstrate data accessibility and citeability, and introduce to central data policy elements, e.g. embargo times and licenses. Finally we provide an outlook of the planned integration and linkage with other soil-agricultural repositories within a government-funded comprehensive national research data infrastructure NFDI (NFDI4Agri, Germany).

How to cite: Hoffmann, C., Specka, X., Svoboda, N., and Heinrich, U.: FAIR access to soil and agricultural research data: The BonaRes Data Repository , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9207, https://doi.org/10.5194/egusphere-egu2020-9207, 2020.

D889 |

EGU2020-10057

Putting the INGV data policy into practice: considerations after the first-year experience

Mario Locati, Francesco Mariano Mele, Vincenzo Romano, Placido Montalto, Valentino Lauciani, Roberto Vallone, Giuseppe Puglisi, Roberto Basili, Anna Grazia Chiodetti, Antonella Cianchi, Massimiliano Drudi, Carmela Freda, Maurizio Pignone, and Agata Sangianantoni

The Istituto Nazionale di Geofisica e Vulcanologia (INGV) has a long tradition of sharing scientific data, well before the Open Science paradigm was conceived. In the last thirty years, a great deal of geophysical data generated by research projects and monitoring activities were published on the Internet, though encoded in multiple formats and made accessible using various technologies.

To organise such a complex scenario, a working group (PoliDat) for implementing an institutional data policy operated from 2015 to 2018. PoliDat published three documents: in 2016, the data policy principles; in 2017, the rules for scientific publications; in 2018, the rules for scientific data management. These documents are available online in Italian, and English (https://data.ingv.it/docs/).

According to a preliminary data survey performed between 2016 and 2017, nearly 300 different types of INGV-owned data were identified. In the survey, the compilers were asked to declare all the available scientific data differentiating by the level of intellectual contribution: level 0 identifies raw data generated by fully automated procedures, level 1 identifies data products generated by semi-automated procedures, level 2 is related to data resulting from scientific investigations, and level 3 is associated to integrated data resulting from complex analysis.

A Data Management Office (DMO) was established in November 2018 to put the data policy into practice. DMO first goal was to design and establish a Data Registry aimed to satisfy the extremely differentiated requirements of both internal and external users, either at scientific or managerial levels. The Data Registry is defined as a metadata catalogue, i.e., a container of data descriptions, not the data themselves. In addition, the DMO supports other activities dealing with scientific data, such as checking contracts, providing advice to the legal office in case of litigations, interacting with the INGV Data Transparency Office, and in more general terms, supporting the adoption of the Open Science principles.

An extensive set of metadata has been identified to accommodate multiple metadata standards. At first, a preliminary set of metadata describing each dataset is compiled by the authors using a web-based interface, then the metadata are validated by the DMO, and finally, a DataCite DOI is minted for each dataset, if not already present. The Data Registry is publicly accessible via a dedicated web portal (https://data.ingv.it). A pilot phase aimed to test the Data Registry was carried out in 2019 and involved a limited number of contributors. To this aim, a top-priority data subset was identified according to the relevance of the data within the mission of INGV and the completeness of already available information. The Directors of the Departments of Earthquakes, Volcanoes, and Environment supervised the selection of the data subset.

The pilot phase helped to test and to adjust decisions made and procedures adopted during the planning phase, and allowed us to fine-tune the tools for the data management. During the next year, the Data Registry will enter its production phase and will be open to contributions from all INGV employees.

How to cite: Locati, M., Mele, F. M., Romano, V., Montalto, P., Lauciani, V., Vallone, R., Puglisi, G., Basili, R., Chiodetti, A. G., Cianchi, A., Drudi, M., Freda, C., Pignone, M., and Sangianantoni, A.: Putting the INGV data policy into practice: considerations after the first-year experience, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10057, https://doi.org/10.5194/egusphere-egu2020-10057, 2020.

D890 |

EGU2020-12001

Building a sustainable international research data infrastructure - Lessons learnt in the IGSN 2040 project

Jens Klump, Kerstin Lehnert, Lesley Wyborn, and Sarah Ramdeen and the IGSN 2040 Steering Committee

Like many research data infrastructures, the IGSN Global Sample Number started as a research project. The rapid uptake of IGSN in the last five years as well as the expansion of diversity of use cases, in particular beyond the geosciences, mean that IGSN has outgrown its current structure as implemented in 2011, and the technology is in urgent need of a refresh. The expected exponential growth of the operation requires the IGSN Implementation Organization (IGSN e.V.) to better align the organisation and technical architecture.

In 2018, the Alfred P. Sloan Foundation awarded a grant to redesign and improve the IGSN, to “achieve a trustworthy, stable, and adaptable architecture for the IGSN as a persistent unique identifier for material samples, both technically and organizationally, that attracts, facilitates, and satisfies participation within and beyond the Geosciences, that will be a reliable component of the evolving research data ecosystem, and that is recognized as a trusted partner by data infrastructure providers and the science community alike.”

IGSN is not the first PID service provider to make the transition from project to product and there are lessons to be learnt from other PID services. To this end, the project invited experts in the field of research data infrastructures and facilitated workshops to develop an organisational and technological strategy and roadmap towards long-term sustainability of the IGSN.

To be sustainable, a research data infrastructure like IGSN has to have a clearly defined service or product, underpinned by a scalable business model and technical system. We used the Lean Canvas to define the IGSN services. The resulting definition of service components helped us define IGSN user communities, cost structures and potential income streams. The workshop discussions had already highlighted the conflicting aims between offering a comprehensive service and keeping services lean to reduce their development and operational costs. Building on the Lean Canvas, the definition of a minimum viable product helped to define the role of the IGSN e.V. and the roles for actors offering value-added services based in IGSN outside of the core operation.

How to cite: Klump, J., Lehnert, K., Wyborn, L., and Ramdeen, S. and the IGSN 2040 Steering Committee: Building a sustainable international research data infrastructure - Lessons learnt in the IGSN 2040 project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12001, https://doi.org/10.5194/egusphere-egu2020-12001, 2020.

D891 |

EGU2020-12419

NOAA/NCEI‘s Challenges in Meeting New Open Data Requirements

Nancy Ritchey

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) National Centers for Environmental Information (NCEI) stewards one of the world’s largest and most diverse collections of environmental data. The longevity of this organization has led to a great diversity of digital and physical data in multiple formats and media. NCEI strives to develop and implement processes, guidance, tools and services to facilitate the creation and preservation of independently understandable data that is open and FAIR (Findable, Accessible, Interoperable, Reusable).

The Foundations for Evidence-Based Policymaking Act (Evidence Act) (PL 115-435), which includes the Open, Public, Electronic, and Necessary Government Data (OPEN) Act (2019), requires all U.S. Federal data to be shared openly. Meeting the requirements of the Evidence Act, FAIR and OPEN has many challenges. One challenge is the requirements are not static, they evolve over time based on the data lifecycle, changes within the designated user community (ex. user needs and skills) and transition to new technology such as cloud. Consistently measuring and documenting compliance is another challenge.

NCEI is tackling the challenges of ensuring our data holdings meet the requirements of OPEN, FAIR and the Evidence Act in multiple areas through the consistent implementation of community best practices, knowledge of current and potential user communities, and elbow grease.

This presentation will focus on NCEI’s experiences with taking data beyond independently understandable to meeting the Evidence Act, FAIR, and OPEN.

How to cite: Ritchey, N.: NOAA/NCEI‘s Challenges in Meeting New Open Data Requirements, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12419, https://doi.org/10.5194/egusphere-egu2020-12419, 2020.

D892 |

EGU2020-13285

The challenging research data management support in the interdisciplinary cluster of excellence CliCCS

Ivonne Anders, Andrea Lammert, and Karsten Peters

In 2019 the Universität Hamburg was awarded funding for 4 clusters of excellence in the Excellence Strategy of the Federal and State Governments. One of these clusters funded by the German Research Foundation (DFG) is “CliCCS – Climate, Climatic Change, and Society”. The scientific objectives of CliCCS are achieved within three intertwined research themes, on the Sensitivity and Variability in the Climate System, Climate-Related Dynamics of Social Systems, and Sustainable Adaption Scenarios. Each theme is structured into multiple projects addressing sub-objectives of each theme. More than 200 researchers the Hamburg University, but also other connected research centers and partner institutions are involved and almost all of them are using but mainly produce new data.

Research data is produced with great effort and is therefore one of the valuable assets of scientific institutions. It is part of good scientific practice to make research data freely accessible and available in the long term as a transparent basis for scientific statements.

Within the interdisciplinary cluster of excellence CliCCS, the type of research data is very different. The data range from results from physically dynamic ocean and atmosphere models, to measurement data in the coastal area, to survey and interview data in the field of sociology.

The German Climate Computing Center (DKRZ) is taking care on the Research Data Management and supports the researchers in creating data management plans, keeping naming conventions or simply finding the optimal repository to publish the data. The goal is to store and long-term archiving of the data, but also to ensure the quality of the data and thus to facilitate potential reuse.

How to cite: Anders, I., Lammert, A., and Peters, K.: The challenging research data management support in the interdisciplinary cluster of excellence CliCCS, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13285, https://doi.org/10.5194/egusphere-egu2020-13285, 2020.

D893 |

EGU2020-15358

EPOS Multi-scale laboratories Data Services & Trans-national access program

Richard Wessels and Otto Lange and the EPOS TCS Multi-scale laboratories Team

EPOS (European Plate Observing System) is an ESFRI Landmark and European Research Infrastructure Consortium (ERIC). The EPOS Thematic Core Service Multi-scale laboratories (TCS MSL) represents a community of European solid Earth sciences laboratories including high temperature and pressure experimental facilities, electron microscopy, micro-beam analysis, analogue tectonic and geodynamic modelling, paleomagnetism, and analytical laboratories.

Participants and collaborating laboratories from Belgium, Bulgaria, France, Germany, Italy, Norway, Portugal, Spain, Switzerland, The Netherlands, and the UK are already organized in the TCS MSL. Unaffiliated European solid Earth sciences laboratories are welcome and encouraged to join the growing TCS MSL community. Members of the TCS MSL are also represented in the EPOS Sustainability Phase (SP).

Laboratory facilities are an integral part of Earth science research. The diversity of methods employed in such infrastructures reflects the multi-scale nature of the Earth system and is essential for the understanding of its evolution, for the assessment of geo-hazards, and for the sustainable exploitation of geo-resources.

Although experimental data from these laboratories often provide the backbone for scientific publications, they are often only available as supplementary information to research articles. As a result, much of the collected data remains unpublished, inaccessible, and often not preserved for the long term.

The TCS MSL is committed to make Earth science laboratory data Findable, Accessible, Interoperable, and Reusable (FAIR). For this purpose the TCS MSL has developed an online portal that brings together DOI-referenced data publications from research data repositories related to the TCS MSL context (https://epos-msl.uu.nl/).

In addition, the TCS MSL has developed a Trans-national access (TNA) program that allows researchers and research teams to apply for physical or remote access to the participating EPOS MSL laboratories. Three pilot calls were launched in 2017, 2018, and 2019, with a fourth call scheduled for 2020. The pilot calls were used to develop and refine the EPOS wide TNA principles and to initialize an EPOS brokering service, where information on each facility offering access will be available for the user and where calls for proposals are advertised. Access to the participating laboratories is currently supported by national funding or in-kind contribution. Based on the EPOS Data policy & TNA General Principles, access to the laboratories is regulated by common rules and a transparent policy, including procedures and mechanisms for application, negotiation, proposal evaluation, user feedback, use of laboratory facilities and data curation.

Access to EPOS Multi-scale laboratories is a unique opportunity to create new synergy, collaboration and innovation, in a framework of trans-national access rules.

An example of such a successful collaboration is between MagIC and EPOS TCS MSL. This collaboration will allow paleomagnetic data and metadata to be exchanged between EPOS and the MagIC (https://www.earthref.org/MagIC) database. Such collaborations are beneficial to all parties involved and support the harmonization and integration of data at a global scale.

How to cite: Wessels, R. and Lange, O. and the EPOS TCS Multi-scale laboratories Team: EPOS Multi-scale laboratories Data Services & Trans-national access program, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15358, https://doi.org/10.5194/egusphere-egu2020-15358, 2020.

D894 |

EGU2020-18398

Towards FAIR GNSS data: challenges and open problems

Anna Miglio, Carine Bruyninx, Andras Fabian, Juliette Legrand, Eric Pottiaux, Inge Van Nieuwerburgh, and Dries Moreels

Nowadays, we measure positions on Earth’s surface thanks to Global Navigation Satellite Systems (GNSS) e.g. GPS, GLONASS, and Galileo. Activities such as navigation, mapping, and surveying rely on permanent GNSS tracking stations located all over the world.
The Royal Observatory of Belgium (ROB) maintains and operates a repository containing data from hundreds of GNSS stations belonging to the European GNSS networks (e.g. EUREF, Bruyninx et al., 2019).

ROB’s repository contains GNSS data that are openly available and rigorously curated. The curation data include detailed GNSS station descriptions (e.g. location, pictures, and data author) as well as quality indicators of the GNSS observations.

However, funders and research policy makers are progressively asking for data to be made Findable, Accessible, Interoperable, and Reusable (FAIR) and therefore to increase data transparency, discoverability, interoperability, and accessibility.

In particular, within the GNSS community, there is no shared agreement yet on the need for making data FAIR. Therefore, turning GNSS data FAIR presents many challenges and, although FAIR data has been included in EUREF’s strategic plan, no practical roadmap has been implemented so far. We will illustrate the specific difficulties and the need for an open discussion including also other communities working on FAIR data.

For example, making GNSS data easily findable and accessible would require to attribute persistent identifiers to the data. It is worth noting that the International GNSS Service (IGS) is only now beginning to consider the attribution of DOIs (Digital Object Identifiers) to GNSS data, mainly to allow data citation and acknowledgement of data providers. Some individual GNSS data repositories are using DOIs (such as UNAVCO, USA). Are DOIs the only available option or are there more suitable types of URIs (Uniform Resource Identifiers) to consider?

The GNSS community would greatly benefit from FAIR data practices, as at present, (almost) no licenses have been attributed to GNSS data, data duplication is still an issue, historical provenance information is not available because of data manipulations in data centres, citation of the data providers is far from the rule, etc.

To move further along the path towards FAIR GNSS data, one would need to implement standardised metadata models to ensure data interoperability, but, as several metadata standards are already in use in various scientific disciplines, which one to choose?

Then, to facilitate the reuse (and long-term preservation) of GNSS data, all metadata should be properly linked to the corresponding data and additional metadata, such as provenance and license information. The latter is a good example up for discussion: despite the fact that ‘CC BY’ license is already assigned to some of the GNSS data, other licenses might need to be enabled.

Bruyninx C., Legrand J., Fabian A., Pottiaux E. (2019) “GNSS Metadata and Data Validation in the EUREF Permanent Network”. GPS Sol., 23(4), https://doi: 10.1007/s10291-019-0880-9

How to cite: Miglio, A., Bruyninx, C., Fabian, A., Legrand, J., Pottiaux, E., Van Nieuwerburgh, I., and Moreels, D.: Towards FAIR GNSS data: challenges and open problems, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18398, https://doi.org/10.5194/egusphere-egu2020-18398, 2020.

D895 |

EGU2020-18847

Staying fair while being FAIR - challenges with FAIR and Open data and services for distributed community services in Seismology

Florian Haslinger and Epos Seismology Consortium

The European Plate Observing System EPOS is the single coordinated framework for solid Earth science data, products and services on a European level. As one of the science domain structures within EPOS, EPOS Seismology brings together the three large European infrastructures in seismology, ORFEUS for seismic waveform data & related products, EMSC for parametric earthquake information, and EFEHR for seismic hazard and risk information. Across these three pillars, EPOS Seismology provides services to store, discover and access seismological data and products from raw waveforms to elaborated hazard and risk assessment. The initial data and product contributions come from academic institutions, government offices, or (groups of) individuals, and are generated as part of academic research as well as within officially mandated monitoring or assessment activities. Further products are then elaborated based on those initial inputs by small groups or specific institutions, usually mandated for these tasks by 'the community'. This landscape of coordinated data and products services has evolved in a largely bottom-up fashion over the last decades, and led to a framework of generally free and open data, products and services, for which formats, standards and specifications continue to be emerging and evolving from within the community under a rather loose global coordination.

The advent of FAIR and Open concepts and the push towards their (formalized) implementation from various directions has stirred up this traditional setting. While the obvious benefits of FAIR and Open have been readily accepted in the community, issues and challenges are surfacing in their practical application. How can we ensure (or enforce) appropriate attribution of all involved actors through the whole data life-cycle, and what actually is appropriate? How do we ensure end-to-end reproducibility and where do we draw the practical limits to it? What approach towards licensing should we take for which products and services, and what are the legal / downstream implications? How do we best use identifiers and which ones actually serve the intended purpose? And finally, how do we ensure that effort is rewarded, that best practices are followed, and that misbehavior is identified and potentially sanctioned?

In this contribution we present how the community organization behind EPOS Seismology is discussing these issues, what approaches towards addressing them are being considered, and where we today see the major hurdles on the way towards a truly fair FAIR and Open environment.

How to cite: Haslinger, F. and Consortium, E. S.: Staying fair while being FAIR - challenges with FAIR and Open data and services for distributed community services in Seismology, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18847, https://doi.org/10.5194/egusphere-egu2020-18847, 2020.

D896 |

EGU2020-17073

Beyond article publishing - support and opportunities for researchers in FAIR data sharing

Graham Smith and Andrew Hufton

Researchers are increasingly expected by funders and journals to make their data available for reuse as a condition of publication. At Springer Nature, we feel that publishers must support researchers in meeting these additional requirements, and must recognise the distinct opportunities data holds as a research output. Here, we outline some of the varied ways that Springer Nature supports research data sharing and report on key outcomes.

Our staff and journals are closely involved with community-led efforts, like the Enabling FAIR Data initiative and the COPDESS 2014 Statement of Commitment ^1-4. The Enabling FAIR Data initiative, which was endorsed in January 2019 by Nature and Scientific Data, and by Nature Geoscience in January 2020, establishes a clear expectation that Earth and environmental sciences data should be deposited in FAIR⁵ Data-aligned community repositories, when available (and in general purpose repositories otherwise). In support of this endorsement, Nature and Nature Geoscience require authors to share and deposit their Earth and environmental science data, and Scientific Data has committed to progressively updating its list of recommended data repositories to help authors comply with this mandate.

In addition, we offer a range of research data services, with various levels of support available to researchers in terms of data curation, expert guidance on repositories and linking research data and publications.

We appreciate that researchers face potentially challenging requirements in terms of the ‘what’, ‘where’ and ‘how’ of sharing research data. This can be particularly difficult for researchers to negotiate given that huge diversity of policies across different journals. We have therefore developed a series of standardised data policies, which have now been adopted by more than 1,600 Springer Nature journals.

We believe that these initiatives make important strides in challenging the current replication crisis and addressing the economic⁶ and societal consequences of data unavailability. They also offer an opportunity to drive change in how academic credit is measured, through the recognition of a wider range of research outputs than articles and their citations alone. As signatories of the San Francisco Declaration on Research Assessment⁷, Nature Research is committed to improving the methods of evaluating scholarly research. Research data in this context offers new mechanisms to measure the impact of all research outputs. To this end, Springer Nature supports the publication of peer-reviewed data papers through journals like Scientific Data. Analysis of citation patterns demonstrate that data papers can be well-cited, and offer a viable way for researchers to receive credit for data sharing through traditional citation metrics. Springer Nature is also working hard to improve support for direct data citation. In 2018 a data citation roadmap developed by the Publishers Early Adopters Expert Group was published in Scientific Data⁸, outlining practical steps for publishers to work with data citations and associated benefits in transparency and credit for researchers. Using examples from this roadmap, its implementation and supporting services, we outline how a FAIR-led data approach from publishers can help researchers in the Earth and environmental sciences to capitalise on new expectations around data sharing.

https://doi.org/10.1038/d41586-019-00075-3
https://doi.org/10.1038/s41561-019-0506-4
https://copdess.org/enabling-fair-data-project/commitment-statement-in-the-earth-space-and-environmental-sciences/
https://copdess.org/statement-of-commitment/
https://www.force11.org/group/fairgroup/fairprinciples
https://op.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1
https://sfdora.org/read/
https://doi.org/10.1038/sdata.2018.259

How to cite: Smith, G. and Hufton, A.: Beyond article publishing - support and opportunities for researchers in FAIR data sharing, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17073, https://doi.org/10.5194/egusphere-egu2020-17073, 2020.

D897 |

EGU2020-17013

Publishing computational research – A review of infrastructures for reproducible and transparent scholarly communication

Markus Konkol, Daniel Nüst, and Laura Goulier

Many research papers include results based on data that is analyzed using a computational analysis implemented, e.g., in R. Publishing these materials is perceived as being good scientific practice and essential for the scientific progress. For these reasons, organizations that provide funding increasingly demand applicants to outline data and software management plans as part of their research proposals. Furthermore, the author guidelines for paper submissions more often include a section on data availability, and some reviewers reject submissions that do not contain the underlying materials without good reason [1]. This trend towards open and reproducible research puts some pressure on authors to make the source code and data used to produce the computational results in their scientific papers accessible. Despite these developments, publishing reproducible manuscripts is difficult and time-consuming. Moreover, simply providing access to code scripts and data files does not guarantee computational reproducibility [2]. Fortunately, several projects work on applications to assist authors in publishing executable analyses alongside papers considering the requirements of the aforementioned stakeholders. The chief contribution of this poster is a review of software solutions designed to solve the problem of publishing executable computational research results [3]. We compare the applications with respect to aspects that are relevant for the involved stakeholders, e.g., provided features and deployment options, and also critically discuss trends and limitations. This comparison can be used as a decision support by publishers who want to comply with reproducibility principles, editors and program committees who would like to add reproducibility requirements to the author guidelines, applicants of research proposals in the process of creating data and software management plans, and authors looking for ways to distribute their work in a verifiable and reusable manner. We also include properties related to preservation relevant for librarians dealing with long-term accessibility of research materials.

References:

1) Stark, P. B. (2018). Before reproducibility must come preproducibility. Nature, 557(7706), 613-614.

2) Konkol, M., Kray, C., & Pfeiffer, M. (2019). Computational reproducibility in geoscientific papers: Insights from a series of studies with geoscientists and a reproduction study. International Journal of Geographical Information Science, 33(2), 408-429.

3) Konkol, M., Nüst, D., & Goulier, L. (2020). Publishing computational research - A review of infrastructures for reproducible and transparent scholarly communication. arXiv preprint arXiv:2001.00484.

How to cite: Konkol, M., Nüst, D., and Goulier, L.: Publishing computational research – A review of infrastructures for reproducible and transparent scholarly communication, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17013, https://doi.org/10.5194/egusphere-egu2020-17013, 2020.

D898 |

EGU2020-22423

Geographical scientific publications in ORBi, the ULiège institutional repository: analysis of the socio-economic influencing factors of downloads

(withdrawn)

Simona Stirbu

D899 |

EGU2020-19682

Designing services that are more than FAIR with User eXperience (UX) techniques

(withdrawn)

Carl Watson, Paulius Tvaranavicius, and Rehan Kaleem

D900 |

EGU2020-16456

Data download speed test for CMIP6 model output: preliminary results

Yufu Liu, Zhehao Ren, Karen K.Y. Chan, and Yuqi Bai

The World Climate Research Programme (WCRP) facilitates analysis and prediction of Earth system change for use in a range of practical applications of direct relevance, benefit and value to society. WCRP initialized the Coupled Model Intercomparison Project (CMIP) in 1995. The aim of CMIP is to better understand past, present and future climate changes arising from natural, unforced variability or in response to changes in radiative forcing in a multi-model context.

The climate model output data that are being produced during this sixth phase of CMIP (CMIP6) is expected to be 40~60 PB. It is still not very clear whether researchers worldwide may experience a big problem when downloading such a huge volume of data. This work addressed this issue by performing data download speed test for all the CMIP6 data nodes.

A Google Chrome-based data download speed test website (http://speedtest.theropod.tk) was implemented. It leverages the Allow CORS: Access-Control-Allow-Origin extension to access to each CMIP6 data node. This test consists of four steps: Installing and enabling Allow CORS extension in Chrome, performing data download speed test for all the CMIP6 data nodes, presenting the test results, and uninstalling the extension. The speed test is performed by downloading a certain chunk of model output data file from the thredds data server of each data node.

Researchers from 11 countries have performed this test in 24 cities against all the 26 CMIP6 data nodes. The fastest transfer speed was 124MB/s, and the slowest were 0 MB/s because of connect timeout. Data transfer speed in developed countries (United States, Netherland, Japan, Canada, Great Britain) is significantly faster than that in developing countries (China, India, Russia, Pakistan). In developed countries the data transfer mean speed is roughly 80Mb/s, equal to the median US residential broadband speed provided by cable or fiber（FCC Measuring Fixed Broadband - Eighth Report, but in developing countries the mean transfer speed is usually much slower, roughly 9Mb/s. Data transfer speed was significantly faster when the data nodes and test sites were both at developed countries, for example, downloading data from IPSL, DKRZ or GFDL at Wolvercote, UK.

Although further test are definitely needed, this preliminary result clearly show that the actual data download speed varies dramatically in different countries, and for different data node. This suggests that ensuring smooth access to CMIP6 data is still challenging.

How to cite: Liu, Y., Ren, Z., Chan, K. K. Y., and Bai, Y.: Data download speed test for CMIP6 model output: preliminary results, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16456, https://doi.org/10.5194/egusphere-egu2020-16456, 2020.