ESSI2.5 | Research data infrastructures in ESS facing user needs, sustainability, and cultural change
EDI
Research data infrastructures in ESS facing user needs, sustainability, and cultural change
Co-organized by EOS4
Convener: Daniel NüstECSECS | Co-conveners: Christin HenzenECSECS, Kirsten Elger, Christian Pagé, Heinrich Widmann, Kerstin Lehnert
Orals
| Tue, 25 Apr, 16:15–18:00 (CEST)
 
Room 0.16
Posters on site
| Attendance Wed, 26 Apr, 10:45–12:30 (CEST)
 
Hall X4
Orals |
Tue, 16:15
Wed, 10:45
Research data infrastructures (RDIs) aim to manage and share research products and metadata systematically to enable research across all scales and disciplinary boundaries. Their services support researchers throughout the entire research lifecycle, especially during data management and collaborative analysis, and they foster FAIRness and openness, e.g., by applying established standards for metadata, data, and/or scientific workflows. Through their offerings and services, RDIs can shape research practices and are strongly connected with the communities of users that identify and associate themselves with them.

Naturally, the potential of RDIs faces many challenges. Even though it is clear that RDIs are indispensable for solving big societal problems, their wide adoption requires a cultural change within research communities. At the same time RDIs themselves must be developed further to serve user needs. And, also at the same time, the sustainability of RDIs must be improved, international cooperation increased, and duplication of development efforts must be avoided. To be able to provide a community of diverse career stages and backgrounds with a convincing infrastructure that is established beyond national and institutional boundaries, new collaboration patterns and funding approaches must be tested so that RDIs foster cultural change in academia and be a reliable foundation for FAIR and open research. This needs to happen while academia struggles with improving researcher evaluation, with a continuing digital disruption, with enhancing scholarly communication, and with diversity, equity, and inclusion.

In the Earth System Sciences (ESS), several research data infrastructures and components are currently developed on different regional and disciplinary scales, all of which face these challenges at some level. This session provides a forum to exchange methods, stories, and ideas to enable cultural change and international collaboration in scientific communities, to bridge the gap between user needs, and to build sustainable software solutions.

Orals: Tue, 25 Apr | Room 0.16

Chairpersons: Kirsten Elger, Heinrich Widmann, Daniel Nüst
16:15–16:20
16:20–16:30
|
EGU23-7968
|
On-site presentation
David Carlson, Hans Pfeiffenberger, and Kirsten Elger

Science changes direction, practice and impact when researchers discover tangible rewards. Policy organizations, funding agencies and educational institutions might wish otherwise but long experience suggests that new incentive structures, better recognition of engagement, and cultural change emerge bottom-up, not top-down. For these reasons, recent emergence of data journals issuing valid credit for data providers accompanied by guidance and assurance for data users promote rapid positive change in data sharing and impact. In Earth System Science Data (ESSD), a ‘new’ (since 2009) Copernicus journal (and, likewise, in Scientific Data by Springer/Nature since 2014 and in a few other recent journals), authors experience widespread, often unanticipated, impact of their data through use, and re-use and citation. Meanwhile, journalists discover, and appreciate, reliability and utility of data publications, particularly for climate, biodiversity or public health data products that update on regular (e.g. annual) bases. With care and cooperation, ESSD publications on topics such as agricultural or woodfire emissions, population, global carbon, methane or energy budgets, or regional pipeline capabilities: a) feed and support ‘front-page’ articles in BBC, Washington Post and other nationally- and internationally-prominent news sources; b) develop useful options for essential planetary monitoring (e.g. as components incorporated into UNFCCC’s proposed Global Stocktake); and c) demonstrate science - via normal steps of scrutiny and revision - engaged with urgent social issues. Through familiar but innovative mechanisms researchers gain validation and certification of data (for citation credit!), ensure wide re-use throughout broad research communities, and often achieve substantial public impact. These new mechanisms signal a positive change in the culture of our science. 

How to cite: Carlson, D., Pfeiffenberger, H., and Elger, K.: Data journals induce culture change in earth sciences, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7968, https://doi.org/10.5194/egusphere-egu23-7968, 2023.

16:30–16:40
|
EGU23-2780
|
ECS
|
On-site presentation
Kristina Vrouwenvelder and Shelley Stall

Open Science is transformative, removing barriers to sharing science and increasing reproducibility and transparency. The benefits of Open Science are maximized when its principles are incorporated throughout the research process, through working collaboratively with community members and sharing data, software, workflows, samples, and other aspects of scientific research as FAIRly and openly as possible. However, the paths toward Open Science are not always apparent. Developing Open Science skills is an ongoing practice, and while these skills enhance outcomes for individual researchers as well as the broader community, there are many concepts, approaches, and tools to learn along the way.

How can we break down the barriers confronting researchers in their Open Science journey? How can we develop and support necessary infrastructure to reuse, distribute, and reproduce the outputs of scientific research? How do we create a culture where having better tools, practices, and methods helps us achieve this goal? 

We will share work by AGU, our collaborators, and the broader community to support researchers in the Open Science journey, build groups to share resources, leading practices, and experiences, and help develop networks of support across the Earth, space, and environmental science community at all levels, to better support the culture of the future.

How to cite: Vrouwenvelder, K. and Stall, S.: Community Building for Data Sharing and Open Science within the Earth, Space, and Environmental Sciences, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2780, https://doi.org/10.5194/egusphere-egu23-2780, 2023.

16:40–16:50
|
EGU23-15439
|
solicited
|
On-site presentation
Daniele Bailo, Rossana Paciello, Valerio Vinciarelli, and Carmela Freda

The path from the conception of a disruptive and innovative Research Data Infrastructure (RDI) to a successful operational RDI, is influenced almost entirely by the ability to engage at equal levels, researchers, IT experts, practitioners and data providers in its usage and adoption. This is particularly true for distributed RDI.

In this work, we present the lessons learnt in EPOS (European Plate Observing System), the unique, distributed pan-European Research Infrastructure in the solid Earth domain. In EPOS a series of challenges were faced, in terms of consensus-driven choices along the technical, governance, sustainability, and scientific dimensions.

EPOS is built for promoting collaboration, and harmonization of heterogeneous datasets, practices, and methods from ten different solid Earth communities. The final goal is to foster innovation and facilitate novel scientific discoveries. EPOS is a large research infrastructure including more than 60 data and service providers from 25 European Countries, providing 250 data services, delivering more than 30 different data formats, and covering more than 800 TB of data in total, described by more than 20 different metadata standards. It was conceived back in 2002, included in the ESFRI (European Strategic Forum on Research Infrastructures) Roadmap in 2008, then implemented through three European Projects (EPOS-PP Preparatory Phase (2010-2014), EPOS-IP Implementation Phase (2015-2019), EPOS-SP Sustainability Phase (2020-2023). EPOS was granted the status of ERIC (European Research Infrastructure Consortium) in 2018 and is in its Operational Phase since January 2023.

The first lesson learned in this journey is related to the need for procedures and boards for community building and consensus establishment; this was achieved through clear governance where all key stakeholders interact and are informed through appropriate boards and committees. The second one is technical: to integrate such heterogenous datasets into a single platform (the EPOS Data Portal) a flexible architecture based on the microservices approach was adopted. The third lesson is related to the description of the datasets and services provided by the various thematic communities in EPOS, achieved through a rich metadata model that maps enough information to drive the integration occurring at the central system underpinning the EPOS Data portal. The fourth lesson is related to the legal and governance aspects: to keep communities committed, legal agreements for governance and coordination and for the thematic data provision were established; this ensures community engagement and the adoption of common criteria and principles.

Finally, the fifth lesson is related to the co-development approach. For managing decisions and consensus on key technical and scientific aspects within a community of more than 80 individuals with different roles, responsibilities and expertise, a clear process was set up. It is inspired to the shape-up methodology but reviewed for the research context, and it proved to be effective in the EPOS RI where international collaboration is needed to manage integrated data provision.

Many challenges remain open, for instance how to recognize and to encourage the careers within RDI. These indeed require specific skills, and the assumption of responsibilities within the RDI should be recognized by setting up dedicated career paths.

How to cite: Bailo, D., Paciello, R., Vinciarelli, V., and Freda, C.: Community engagement and uptake: lessons learnt in EPOS, the Research Infrastructure for Solid Earth Sciences., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15439, https://doi.org/10.5194/egusphere-egu23-15439, 2023.

16:50–17:00
|
EGU23-14456
|
On-site presentation
Jörg Seegert, Daniel Nüst, and Lars Bernard

Academia faces enormous challenges to realign publication and evaluation practices towards a sustainable, open, transparent, diverse, equitable, and inclusive way of conducting research. However, values, attitudes, behaviours, and habits are difficult to change. Furthermore, individuals and organisations do not act isolated but in different contexts - from local to international, from labs to academic societies, from early career researchers to university leadership. This makes cultural change very demanding. The Earth System Sciences (ESS) consortium of the German Research Data Infrastructure programme (NFDI4Earth, https://nfdi4earth.de/) takes deliberate steps towards introducing such cultural change in the form of the NFDI4Earth FAIRness and Openness Commitment (https://nfdi4earth.de/2coordinate/cultural-change). Intentionally going beyond FAIR data, the commitment is set up as a means to introduce a cultural change towards openness and to initiate a discourse on the full circle of related values and practices as well as connected social, cultural, and economic topics. 

The NFDI4Earth Commitment can be endorsed by the institutions and organisations involved and related to NFDI4Earth, such as research institutions, publishers, or funders, as well as signed by individual researchers. The commitment is designed as a continuous activity and is iterative in its nature: signatories are invited annually to sign the latest version, which over time becomes more extensive or focused, based on the community discussions and the general developments, e.g., in academia as a whole. 

The NFDI4Earth Commitment will create a sense of identity for all actors and becomes an instrument to demonstrate values. It provides commonly accepted and manifested documentation that the signatories strive to adhere to, e.g., best practices in ESS research data management and can be held accountable by members or partners. In its current incarnation, the NFDI4Earth commitment puts a particular focus on the need for individuals and organisations to question and reflect their own personal, institutional and their fellows behaviours and attitudes, on the broadness of scientific contributions, scientific evaluation and incentives. Furthermore, it stresses the value of sharing and collaborating and presents a positive picture of change. To emphasize the iterative nature, the NFDI4Earth commitment includes a first level or stage of commitment, with more levels to be expected in future incarnations. 

In this work, we present the current version of the NFDI4Earth Commitment, the process for its creation, the steps taken and planned for community engagement, and considerations for the future development, in particular the transferability to international communities or other disciplines within the context of the NFDI. Where already clear, we present lessons learned on the creation and introduction of the commitment.

How to cite: Seegert, J., Nüst, D., and Bernard, L.: Initiating Cultural Change in the German Earth System Sciences Community with a Commitment Statement, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14456, https://doi.org/10.5194/egusphere-egu23-14456, 2023.

17:00–17:10
|
EGU23-12638
|
ECS
|
On-site presentation
Amruth Kiran and Teja Malladi

What would it take to set up a center of geospatial excellence?

Across academia and industry, there is a growing need to complement the exponential growth of Big Earth Observation (EO) data and the numerous possibilities for processing, rendering, visualization, and sharing geospatial data and its derivatives. The challenges that come with such opportunities include the identification of efficient tools, hiring and training of people at different skill levels, and the outreach activities that enable easier communication of scientific findings. These silos need to be broken into at each step of the process that helps sustain the growth of a lab in the long run.

One such example is the Geospatial Lab (GSL) at the Indian Institute for Human Settlements (IIHS), Bangalore, India. Over the years, GSL has catered to many research, practice, and capacity-building initiatives of the institute that has helped bridge the gap and bring about a cultural change in the appreciation of the geospatial sciences. Developing a robust Research Data Infrastructure (RDI) using the foundational principles of open-source Spatial Data Infrastructure (SDI) at all levels of community engagement, has proven effective in reaching the right audience and decision-makers. Standard practices within the lab such as technical documentation, internal capacity building, extensive metadata, modern spatial/computing practices, spatial data management frameworks, and the interdisciplinarity of the team have seen greater adoption across the institution as well. These factors, coupled with excellent institutional support have been at the forefront of building a scalable, inter-operable, and distributed RDI that aims to prioritize the people over the pixel.

How to cite: Kiran, A. and Malladi, T.: Building a Geospatial Lab – The People, The Tools, and The Process., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12638, https://doi.org/10.5194/egusphere-egu23-12638, 2023.

17:10–17:20
|
EGU23-10177
|
On-site presentation
Aimee Barciauskas, Manil Maskey, Doug Newman, Slesa Adhikari, Olaf Veerman, Leo Thomas, Alexandra Kirk, Kaylin Bugbee, Brian Freitag, Alexey Shiklomanov, Alex Mandel, Hook Hua, and George Chang

NASA's Visualization, Exploration, and Data Analysis (VEDA) project is an open-source science cyberinfrastructure for data processing, visualization, exploration, and geographic information systems (GIS) capabilities (https://www.earthdata.nasa.gov/esds/veda). VEDA was an ambitious platform and one that was only made possible in the past year by building upon existing NASA projects. The extensive technology community at NASA continues to come together to design, build and use VEDA’s interoperable APIs and datasets.

This presentation will demo the current capabilities of VEDA and discuss how these capabilities were designed and architected with the central goals of science delivery, reproducible science, and interoperability to support re-use of data and APIs across NASA’s Earth Science ecosystem of tools. The presentation will close with VEDA’s future plans. In 2023, VEDA will support NASA’s Transform to Open Science (TOPS) program and open-source science initiatives through data, APIs and analytics platforms. In 2023 and beyond, VEDA will advance the state of the art in cloud-based Earth science as well as strengthening the ties of technology within NASA.

The projects behind VEDA’s current features are:

  • The Multi-Mission Algorithm and Analysis Platform (https://maap-project.org/, presented at EGU 2019): Recognizing the numerous advantages of open, reproducible science, NASA and ESA are working together to create the Joint ESA-NASA MAAP. The MAAP brings together relevant data and algorithms in a common virtual environment in order to support the global aboveground terrestrial carbon dynamics research community. 
  • The COVID-19 Earth Observation Dashboard (https://www.earthdata.nasa.gov/covid19/): Following the interest in this dashboard, NASA invested in the design and development of a new dashboard infrastructure. This infrastructure is highly configurable to support easily adding new datasets and discoveries. UI and config layers are built upon the VEDA STAC catalog and Cloud-Optimized GeoTIFFs.
  • The Earthdata Information Systems (EIS) pilots (https://eis.smce.nasa.gov/): Scientists at NASA worked together on open science tools to develop new research projects using Earth Observation data across the domains of fire, freshwater, greenhouse gasses, and sea level rise.
  • ArcGIS Enterprise in the Cloud (gis.earthdata.nasa.gov) provides GIS capabilities.

The projects listed above have all made VEDA a reality in a year. The scientists from EIS are using the new dashboard infrastructure to tell their stories and the analytics backend from MAAP to scale their science.

In 2023, VEDA plans many initiatives in the work to extend its reach within and beyond NASA. 

There are many advanced technologies at NASA and we see an opportunity for VEDA to support closing the information gaps across groups. For example, VEDA will support driving standards for using, publishing and visualizing NASA’s Earthdata Zarr archives and also deliver interoperable APIs for its data stores to support dynamic data visualization and storytelling.

VEDA will also extend its reach beyond NASA by providing a JupyterHub for any user to explore the data behind NASA Earth Science, specifically the discoveries presented in the Earthdata Dashboard.

How to cite: Barciauskas, A., Maskey, M., Newman, D., Adhikari, S., Veerman, O., Thomas, L., Kirk, A., Bugbee, K., Freitag, B., Shiklomanov, A., Mandel, A., Hua, H., and Chang, G.: The Origins of NASA’s Visualization, Exploration and Data Analytics (VEDA) Platform: A platform for biomass research, NASA’s Earth Information System and the COVID-19 Dashboard, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10177, https://doi.org/10.5194/egusphere-egu23-10177, 2023.

17:20–17:30
|
EGU23-15367
|
ECS
|
On-site presentation
|
Christof Lorenz, Mostafa Hadizadeh, Sabine Barthlott, Romy Fösig, Uğur  Çayoğlu, Robert Ulrich, and Felix Bach

A contemporary and flexible Research Data Management (RDM) framework is required to make environmental research data Findable, Accessible, Interoperable, and Reusable (FAIR) and, hence, provide the foundation for open and reproducible earth system sciences. While data-sets that accompany scientific articles are typically published via large data repositories like Pangaea or Zenodo, intermediate, day-to-day, or actively-used data (e.g., data from research projects or prototypical data) is still exchanged via simple cloud storage services and email. And while the FAIR principles require data to be openly findable and accessible, it is often only available within closed and restricted infrastructures and local file systems.

Our research project Cat4KIT hence aims to develop a cross-institutional catalog and RDM framework for the FAIRification of such day-to-day research data. This framework is comprised of four modules / services for

  • providing access to data on storage systems through well-defined and standardized interfaces 

  • harvesting and transforming (meta)data into standardized formats

  • making (meta)data accessible to the public using well-defined and standardized catalog services and interfaces

  • enabling users to search, filter, and explore data from decentralized research data infrastructures.

We develop, implement and evaluate each of these four modules within an inter-institutional consortium consisting of scientists, software developers and potential end-users. This allows us to include a wide-range of research data from multi-dimensional climate model outputs to high-frequency in-situ measurements. We emphasize the application of existing open-source solutions and community standards for data interfaces (THREDDS, STA, S3), (meta)data schemes, and catalog services (Spatio-Temporal Assets Catalog - STAC) in order to ensure an easy integration of research data into the Cat4KIT-framework and a straightforward extension to further research data infrastructures.

In our presentation, we demonstrate the current status of our Cat4KIT-framework as an inter-institutional research data management and catalog platform for the FAIRification of day-to-day research data.

How to cite: Lorenz, C., Hadizadeh, M., Barthlott, S., Fösig, R.,  Çayoğlu, U., Ulrich, R., and Bach, F.: CAT4KIT: A cross-institutional data catalog framework for the FAIRification of environmental research data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15367, https://doi.org/10.5194/egusphere-egu23-15367, 2023.

17:30–17:40
|
EGU23-16396
|
ECS
|
On-site presentation
Pavel Golodoniuc, Vincent Fazio, YunLong Li, Neda Taherifah, Jens Klump, and Lesley Wyborn

AuScope is Australia’s premier research infrastructure provider to the national geoscience community, working on fundamental geoscience questions and grand challenges for the common good and into the future. The organisation is funded by the Australian Government via the National Collaborative Research Infrastructure Strategy (NCRIS). One of its programs – the AuScope Virtual Research Environment (AVRE) – provides a unifying technological platform for AuScope Programs’ data and analytical needs and increases the uptake of data-driven research through various outreach activities. One such activity was the development of the AVRE Build Program that successfully ran over the past three years and aimed to improve the engagement of research teams and assist with translating scientific requirements into reusable solutions ranging from data management to numerical modelling and complex data visualisation.

Developing scientific software solutions with the diverse backgrounds of stakeholders involved is challenging in itself. In a multidisciplinary environment, we had to collaboratively develop and strengthen our design approach to break the “language barrier” between scientists and technologists to achieve greater user acceptance and ongoing adoption of developed solutions. Our approach stems from the Rapid Application Development and the Agile project management methodologies – both popular and widely applied in the realm of software engineering.

We take a user-centred design approach and involve researchers with a vested interest in the project outcomes in all stages of the iterative development lifecycle. We pay particular attention to the definition of project success and a minimum viable project, requirements analysis, wireframing and prototyping through the project launch and handover phases. Organising projects into short, focused sprints with the direct involvement of researchers has allowed us to stay focused on our objectives, deliver projects in short timeframes, and maintain momentum. Through this process and the direct involvement of researchers in the design aspects of the product, we fostered a close collaborative relationship with our users, created a sense of ownership and, as a result, cemented the longevity of the project under the researchers’ custodianship.

Herein, we detail our approach to scientific software development, the social aspects of our experience of cross-institutional and cross-domain collaboration, the challenges we have experienced, and the successes we have achieved. Although still offering room for improvement, the methodologies we employ have proved successful over the last three years, producing low-maintenance tools that are freely accessible to researchers. They helped to engage a wider audience and improve the speed of science delivery, which inspired other projects within the CSIRO Mineral Resources Business Unit and external organisations to implement similar programs.

How to cite: Golodoniuc, P., Fazio, V., Li, Y., Taherifah, N., Klump, J., and Wyborn, L.: Methodologies and techniques to engage geoscience researchers in the technical design process and product development, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16396, https://doi.org/10.5194/egusphere-egu23-16396, 2023.

17:40–17:50
|
EGU23-3269
|
ECS
|
Virtual presentation
Linda Baldewein, Housam Dibeh, Philipp S. Sommer, and Ulrike Kleeberg

In Earth System Sciences, new data portals are currently being developed by what seems to be each new project and research initiative. But what happens to already existing solutions that are in a dire need of a software update? We will introduce the HCDC datasearch portal (https://hcdc.hereon.de/datasearch/), an open-source software solution, that combines data from a legacy database, file storage systems, OGC conform web services and a World Data Center. Our portal provides a common interface for all our heterogeneous data-sources to select and to download the data-products based on filters for metadata and spatio-temporal information.

Three legacy portal solutions at Helmholtz-Zentrum Hereon are replaced by a scalable and easily extendable new portal based on an Elasticsearch cluster in the back-end and a user-friendly web interface as well as a machine readable API in the front-end. To ensure software that fits the user’s workflows, a stakeholder group was involved from the early stages of the planning up until the release of the final product.

Extensibility of the portal is ensured by only storing metadata within the portal. Data access and download is configured based on each decentralized storage solution, e.g. a local database or a World Data Center. Harmonization of metadata is crucial for the user experience of the portal. We limited the searchable metadata to 14 fields in addition to geospatial and temporal metadata, including information such as the platform from which the data originates and the parameter that was measured. Whenever possible, controlled vocabularies were used. Due to the heterogeneity of the data, including climate model results as well as long-tail biogeochemical campaign data, this is an ongoing process.

The HCDC datasearch portal provides an example of the challenges and opportunities of combining data from distributed data sources through a single entry-point based on state-of-the-art web technologies. It can be used to discuss the challenges of re-using legacy solutions in a continually progressing research data infrastructure world.

How to cite: Baldewein, L., Dibeh, H., Sommer, P. S., and Kleeberg, U.: HCDC datasearch portal: Replacing legacy solutions with a unified open-source portal, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3269, https://doi.org/10.5194/egusphere-egu23-3269, 2023.

17:50–18:00
|
EGU23-3511
|
Virtual presentation
Adrian Clark and Kurt Hansen

Researchers need access to infrastructures to make their research data findable, accessible, interoperable
and reusable, known as FAIR. The FAIR principles of data sharing transcend disciplines and encapsulate
the aims and ideals of Open Data advocates.
As researchers look to make their research data FAIR, in some disciplines this may result in data being
shared in disparate solutions. If we look at the Geosciences for example, a researcher may use GitLab to
share their code, a hosted website for discoverability and separate storage for the preservation of
legacy data. Research groups often possess a rich heritage of data spanning
periods of several decades. However, this legacy almost never gets proper attention due to lack
of funding, and thus lack of long-term maintenance plans. As a consequence, the legacy data are
typically unreadable and inaccessible due to obsolete formats and technologies in which they are
provided.
This presentation will demonstrate how 25 years of wind data at the Technical University of Denmark
(DTU) has been shared in accordance with the FAIR principles in their Figshare powered repository.
We’ll demonstrate how the long term availability of the data is secured and how discoverability is ensured
and data reuse is encouraged with robust metadata. This presentation will also touch on the importance of
data reuse throughout the research lifecycle and showcase this for both an academic and lay audience
through the features of the DTU Data repository.

How to cite: Clark, A. and Hansen, K.: The Value of Legacy data - securing access and reuse of 25 years of data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3511, https://doi.org/10.5194/egusphere-egu23-3511, 2023.

Posters on site: Wed, 26 Apr, 10:45–12:30 | Hall X4

Chairpersons: Christin Henzen, Christian Pagé, Daniel Nüst
X4.157
|
EGU23-17491
Christin Henzen, Auriol Degbelo, and Daniel Nüst

Research data infrastructures (RDIs) are evolving and driven by diverse initiatives ranging from international to local, e.g., institutional scales, who try to build sustainable and useful services. The German national data infrastructure for Earth System Sciences (NFDI4Earth) aims to support researchers in 1) discovering and exploring relevant data sources, 2) data publication and curation, 3) solving research data management problems and 4) creating and publishing information products. In the development of the  software architecture for NFDI4Earth, we face challenges of computational, social, cultural, and strategic nature. Here, we are going to present an overview of these challenges and early outcomes and to reflect  first lessons learned from the initial concept and development phase of the NFDI4Earth architecture.

Starting with the (meta) data layer, the landscape of existing ESS services and repositories is diverse and features various metadata and data, like governmental (meta) data following INSPIRE, OGC, and ISO19xxx. This diversity demands harmonising and linking concepts that fit to standards for metadata, data and services, such as OGC APIs or ISO19xxx, as well as to Semantic Web concepts, e.g., FAIR Digital Objects, and provide extension points for (newly developed) specific formats. At the same time, the software stack and technologies of the business layer should consider interoperability, openness and sustainability aspects while providing a flexible solution to manage the distributed metadata. Moreover, in our case, activities on developing (meta) data and business layer concepts also include coordinating a software developer team with different scientific and technological backgrounds spread across several institutions.

NFDI4Earth is located in a dynamic landscape of ESS services and repositories which are often not sustainably funded. Hence, we need to implement  practices and collaborations to link or integrate further software, services, and information products so that an up-to-date living and evolving architecture serves the needs of researchers.

How to cite: Henzen, C., Degbelo, A., and Nüst, D.: Challenges in Developing a Software Architecture for a National Research Data Infrastructure in Earth System Sciences, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17491, https://doi.org/10.5194/egusphere-egu23-17491, 2023.

X4.158
|
EGU23-3686
|
Stephanie Wingo, Deborah Smith, Carson Davis, Shelby Bagwell, Heidi Mok, Edward Keeble, Tammo Feldmann, Anthony Lukach, Alice Ruehl, Camille Woods, Ashlyn Shirey, Elijah Walker, and Rahul Ramachandran

NASA’s Airborne Data Management Group (ADMG) works to promote and ensure the discoverability and accessibility of over 50 years of non-satellite Earth science observations. This includes leading the development of the Catalog of Archived Suborbital Earth Science Investigations (CASEI), which provides a single entry point to efficiently search across all of NASA’s airborne and field data holdings.  CASEI supports NASA’s Open Source Science Initiative vision by providing holistic, descriptive contextual metadata and links to streamline access to suborbital data products, regardless of which repository is responsible for their stewardship.

Metadata in CASEI includes descriptive contextual details that are typically arduous to locate amid a synthesis of scattered publications, project and program websites, and disparate data discovery tools. These metadata include motivating science objectives, key events/time periods in observational records, complementary simultaneous observations, and programmatic details, among others. Diversity of data formats and science disciplines served by CASEI necessitate a common data model to organize suborbital observation metadata and appropriately represent the relationships among campaigns, platforms, and instruments.

This presentation will describe the development of the CASEI system: well-defined data models to drive a cloud-based user data access portal, simultaneously provisioned interfaces enabling synchronous metadata updates, the curation process required to sustain this unique inventory of airborne and field metadata, management of CASEI information content, and connecting end users to data products relevant for their interests - regardless of which NASA distributed archive center holds the data.

Particular attention will be granted to how CASEI facilitates discovery and (re)use of these lesser-known NASA data, supporting FAIR principles and Open Science to enhance the return on investments made in these unique and varied observations. An up-to-date summary of CASEI inventory content, avenues for CASEI enhancements, and potential improvements in suborbital data stewardship at various stages of the data life cycle will also be discussed.

How to cite: Wingo, S., Smith, D., Davis, C., Bagwell, S., Mok, H., Keeble, E., Feldmann, T., Lukach, A., Ruehl, A., Woods, C., Shirey, A., Walker, E., and Ramachandran, R.: An Introduction to NASA’s Catalog of Archived Suborbital Earth Science Investigations (CASEI), EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3686, https://doi.org/10.5194/egusphere-egu23-3686, 2023.

X4.159
|
EGU23-6311
|
ECS
Marco Kulüke, Stephan Kindermann, and Tobias Kölling

The InterPlanetary File System (IPFS) is a novel decentralized file storage network that allows users to store and share files in a distributed manner, which can make it more resilient if individual infrastructure components fail. It also allows for faster access to content as users can get files directly from other users instead of having to go through a central server. However, one of the challenges of using IPFS is ensuring that the files remain available over time. This is where an IPFS pinning service offers a solution. An IPFS pinning service is a type of service that allows users to store and maintain the availability of their files on the IPFS network. The goal of an IPFS pinning service is to provide a reliable and trusted way for users to ensure that their files remain accessible on the IPFS network. This is accomplished by maintaining a copy of the file on the service's own storage infrastructure, which is then pinned to the IPFS network. This allows users to access the file even if the original source becomes unavailable.

We explored the use of the IPFS for scientific data with a focus on climate data. We set up an IPFS node running on a cloud instance at the German Climate Computing Center where selected scientists can pin their data and make them accessible to the public via the IPFS infrastructure. IPFS is a good choice for climate data, because the open network architecture strengthens open science efforts and enables FAIR data processing workflows. Data within the IPFS is freely accessible to scientists regardless of their location and offers fast access rates to large files. In addition, data within the IPFS is immutable, which ensures that the content of a content identifier does not change over time. Due to the recent development of the IPFS, the project outcomes are novel data science developments for the earth system science and are potentially relevant building blocks to be included in the earth system science community.

How to cite: Kulüke, M., Kindermann, S., and Kölling, T.: IPFS Pinning Service for Open Climate Research Data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6311, https://doi.org/10.5194/egusphere-egu23-6311, 2023.

X4.160
|
EGU23-6400
Valerio Vinciarelli, Andrea Orfino, Rossana Paciello, Daniele Bailo, Claudio Goffi, Kety Giuliacci, Manuela Sbarra, Jan Michalek, Harald Nedrebø, Jean-Baptiste Roquencourt, Yann Retout, Daniel Warren, and Janusz Lavrnja-Czapski

The European Plate Observing System (EPOS) is releasing a pan-European research infrastructure (RI) for Solid-Earth sciences that targets different scientific communities. The EPOS RI enables sharing of data and resources; promoting collaboration, harmonization of practices and methods, fosters innovation and novel scientific discoveries. These principles of scientific and technological collaboration are the basis of the concepts of Open Source and Open Collaboration.

EPOS consists of essentially two components: firstly, the so-called Thematic Core Services (TCS) representing the Data providers from the scientific domains (e.g., Seismology, Satellite data etc.); secondly, the central integration node, namely the ICS-C (Integrated Core Services – Central Hub), representing the integrating ICT (Information and Communication Technology) system underpinning the EPOS Data Portal.

EPOS is currently releasing the Open-Source version of its architecture, based on microservices, which includes a GUI (Graphical User Interface) implementing the Data Portal, i.e., the human oriented interface for accessing the assets made available by the TCS. It communicates with the ICS-C system by means of RESTful APIs which also implement AAAI (authentication, authorization, accounting infrastructure). Through the APIs, the Data Portal queries the metadata catalog to discover and contextualize assets of interest provided by the TCS and documented as metadata.

In order to integrate heterogeneous datasets from the TCS, appropriate metadata and semantic descriptions are used to drive interactions with TCS resources or to construct a workflow to be executed across TCS.

EPOS Open Source also includes a microservice to enable the interaction with large-scale computing resources or geoscience software services, represented in EPOS as ICS-D (Integrated Core Services – Distributed). The different processing done through the ICS-D on the TCS data are also metadata driven, the software executed on the ICS-D, which enable additional features and functionalities to the ICS-C core, have their own metadata description and through a plugin architecture run on the ICS-D.

The architecture is designed to integrate with e-Infrastructures such as GRID or CLOUD facilities and particularly ongoing work includes achieving interoperability with EOSC (European Open Science Cloud) by means of FAIR web services. The EPOS architecture has also been used as a template in other initiatives such as other Environmental Science RIs (e.g., ENVRI Catalogue of Services) and Jerico.

In the presentation we will describe the work done so far and the key concept that brought to the adoption of a microservice based, open-source released architecture, and provide perspectives for future extension of the project.

How to cite: Vinciarelli, V., Orfino, A., Paciello, R., Bailo, D., Goffi, C., Giuliacci, K., Sbarra, M., Michalek, J., Nedrebø, H., Roquencourt, J.-B., Retout, Y., Warren, D., and Lavrnja-Czapski, J.: EPOS Open Source: A platform for integrating high-quality research products and services., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6400, https://doi.org/10.5194/egusphere-egu23-6400, 2023.

X4.161
|
EGU23-7455
|
ECS
Enxhi Kreshpa, Sabine Schröder, Niklas Selke, and Martin Schultz

Over the last years, several repositories with curated environmental datasets have been created so that scientific communities have gained access to large collections of data from various domains. The level of data harmonisation and FAIRness, technical readiness and scalability of these repositories differs substantially. This restricts data exploration opportunities and limits scientific exploration with modern data science methods, such as machine learning. In­ the domain of air quality research, we have pioneered a data infrastructure for global observations of surface ozone and other air pollutant measurements that comes with rich possibilities for online data analysis. The data in the Tropospheric Ozone Assessment Report (TOAR) database is collected from about 40 different resource providers, from national and international environmental agencies to individual research groups around the world.
One of these data providers is OpenAQ, the world's first open, real-time air quality platform. Due to the higher standards of curation, the need for data harmonization, and the enriched metadata in the TOAR database, we had to develop an automated workflow to transport archived and real-time data from this provider to the TOAR database. The primary step is to clean and format all the OpenAQ records, according to the TOAR database schema, and concurrently, refine the metadata. The workflow includes tests for data sanity and checks if time series and station metadata can be amended, or whether new time series or station records must be created. The automation manager triggers the workflow hourly, so the database provides clean and updated air quality data at any time. 
The presentation describes the automated workflow and its design principles and discusses how such a workflow might be re-used in other environmental domains. All TOAR-related codes are open source.

How to cite: Kreshpa, E., Schröder, S., Selke, N., and Schultz, M.: An Automated Data Ingestion Workflow for the TOAR Database, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7455, https://doi.org/10.5194/egusphere-egu23-7455, 2023.

X4.162
|
EGU23-8353
Emanuel Soeding, Andrea Poersch, Martin Weinelt, Pier Luigi Buttigieg, Helen Kollai, Sören Lorenz, and Yousef Razeghi

The interconnectivity of existing data infrastructures (DIS) across national and international initiatives (e.g. NFDI, EOSC and others) is an important goal to create a common interoperable data ecosystem. To achieve this, it is critical to harmonize the existing methods and concepts of research data collection, research data-reuse, among the DIS and along the FAIR principles.

Within the Helmholtz Association we maintain more than 50 data infrastructures in the field of Earth and Environment. Procedures of data handling, documentation and storage are hardly coordinated within Helmholtz, even less so within the larger community. To find out about the state of our infrastructures, the different approaches in data management procedures, technical capabilities, and concepts, we conducted a survey among all Helmholtz DIS. We asked questions related to their roles in the community, self-perception, quality control, curation, technology interfaces, data re-use and demands.

Based on this data we developed our vision to create a “Helmholtz data space”, unifying Earth and Environmental Centres and infrastructures and powering a new wave of large-scale, globally oriented, data driven research. The Helmholtz Metadata Collaboration’s (HMC) mission is, to federate (meta)data systems across Earth and Environment Centres and infrastructures throughout the Helmholtz Association, continuously aligning Helmholtz capacities to global norms and developments.

How to cite: Soeding, E., Poersch, A., Weinelt, M., Buttigieg, P. L., Kollai, H., Lorenz, S., and Razeghi, Y.: Towards a harmonized data ecosystem in Earth and Environment– a view on the Helmholtz Association’s Data Infrastructures, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8353, https://doi.org/10.5194/egusphere-egu23-8353, 2023.

X4.163
|
EGU23-8558
The Open Datacube Federation
(withdrawn)
Peter Baumann
X4.164
|
EGU23-10727
|
ECS
LISIRD: Making Solar Data More Accessible
(withdrawn)
Hunter Leise, Ransom Christofferson, Chris Lindholm, Douglas Lindholm, Slaton Spangler, Stéphane Béland, Odele Coddington, Donald Woodraska, and Christopher Pankratz
X4.165
|
EGU23-12254
Arnaud Masson, Bruno Merin, Vicente Navarro, Christophe Arviset, and Helen Middleton

ESA Datalabs is a collaborative scientific platform of the European Space Agency to exploit data across the ESA science directorate missions’ archives (astronomy, planetary and heliophysics). It allows you to bring your code to the data under a private account, shareable with colleagues. Large amount of data such as the GAIA multi-billion stars catalogues can be easily mounted and searchable, allowing large scale scientific investigations impossible to achieve on a regular laptop. It also provides multiple tools to access, process and visualize JWST data and was used during the recent commissioning of JWST. In other words, it handles both public and restricted access data.

In the heliophysics domain, data from a few missions are already mounted, including all public data from the Solar Orbiter mission. A few Jupyter notebooks are already available to help the community making use of the full capabilities of the ESA archives. More will be made available in the future including tools such as JHelioviewer and data mining. Interoperability is at the heart of the ESA datalabs infrastructure and connection to clouds such as the NASA heliocloud and Amazon Web Services accounts (AWS) are in progress. Developed over the past few years, ESA Datalabs is scheduled to be public to the scientific community in 2023. 

How to cite: Masson, A., Merin, B., Navarro, V., Arviset, C., and Middleton, H.: ESA DataLabs: an open science platform relevant to Heliophysics, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12254, https://doi.org/10.5194/egusphere-egu23-12254, 2023.

X4.166
|
EGU23-12272
|
ECS
Melanie Lorenz, Kirsten Elger, Inke Achterberg, and Malte Semmler

The world of geosciences is very broadly positioned on the path to Open Science. The development of a research infrastructure must therefore address the fundamental needs within each discipline. In the geosciences, the spectrum of requirements for the data alone ranges from highly standardized real-time, large data made available internationally (e.g. in seismology or geodesy) to the full spectrum of small and highly variable data from long-tail data communities with best practices of sharing data via tables printed in research articles.

FID GEO is the specialized information service for Geosciences in Germany offering publication services and consulting around the full spectrum of Open Science in the Geosciences since 2016. Funded by the German Research Foundation (DFG), FID GEO is a service of the GFZ German Research Centre for Geosciences (GFZ) and the Göttingen State and University Library (SUB). The project provides broad access to digital knowledge resources, contributing to an open information infrastructure. The service portfolio includes electronic publishing of research results in our domain repositories. GEO-LEOe-docs, the repository for texts and geological maps, is hosted at SUB and GFZ Data Services, the domain repository for research data and scientific software, is hosted at GFZ. In addition, we offer digitization services, especially for (older) journal series and reports and a broad consulting portfolio on Open Science topics. Hereby, FID GEO advocates a holistic view of the chain of scientific results – from sample to data and software to scientific articles – and promotes that the individual elements are digitally linked in the best possible way.

From the beginning on, FID GEO developed services that facilitate the shift towards Open Science by engaging the geoscience community. The FID GEO website, our newsletter and Twitter account are tools to connect us with the community. We inform the majority of the German geoscientists with regular publications in the journal “GMIT - Geowissenschaftliche Mitteilungen”, which is also being published on GEO-LEOe-docs since 2021. Over the last six years, active interactions with the community during conferences, workshops, talks and through online questionnaires, revealed that there still is a high demand for information on open science practices.  Workshops and talks have proven to be very successful tools to meet the large need for discussion. They not only allow us to directly address questions or uncertainties regarding practical aspects of open science practices, but they also offer the suitable framework to prepare the information specifically for each research group. To improve our publication services and to intensify the open information culture in the geosciences, FID GEO collaborates with strategic (inter)national initiatives (like NFDI4Earth), with German geosciences societies and other library-related projects supporting the development of open research data infrastructures.

How to cite: Lorenz, M., Elger, K., Achterberg, I., and Semmler, M.: Establish Open Science practices throughout the geoscience community in Germany with the FID GEO services, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12272, https://doi.org/10.5194/egusphere-egu23-12272, 2023.

X4.167
|
EGU23-13659
|
ECS
Benjamin Louisot, Christof Lorenz, Mostafa Hadizadeh, Ingo Völksch, and David Schäfer

In recent years, the requirements for data from earth system sciences have increased massively. Data from observation systems needs to be transferred into larger research data infrastructures, evaluated and flagged via well-defined quality checks, enriched with standardized metadata and finally made available to the public via standard interfaces. And in order to fulfill the FAIR principles, we have to ensure transparency and reproducibility of all these steps. Moreover, the rising demand of near-real-time (NRT) data requires the whole data pipeline to run operationally with minimal manual effort.

However, in many cases, there are still heterogenous data landscapes to be found without centralized control of data, data processing, version control and QA/QC. This is often aggravated by to inconvenient, outdated and isolated tools and software solutions.

Therefore, we develop and implement an adaptable automated pipeline, which combines the assurance of data consistency, QA/QC (Quality Assurance / Quality Control), graphically supported validation and unified persistence and publication of data. User friendliness is achieved by making the system configurable and trackable through lightweight user interfaces over the complete data lifecycle. By only using open-source software solutions and applying community standards for data formats and interfaces, a high level of sustainability and independence can be ensured.

In this presentation, we hence want to demonstrate such an end-to-end data pipeline that finally allows for the FAIRification of typical environmental sensor data.

How to cite: Louisot, B., Lorenz, C., Hadizadeh, M., Völksch, I., and Schäfer, D.: User-friendly data pipeline for FAIRification of environmental sensordata, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13659, https://doi.org/10.5194/egusphere-egu23-13659, 2023.

X4.168
|
EGU23-14950
Bruce Wilson, Megan Buzanowicz, Amanda Leon, Sara Lubkin, Leigh Sinclair, Geoffry Stano, Michele Thornton, Matthew Tisdale, Tammy Walker, Yaxing Wei, and Stephanie Wingo

This presentation summarizes information on the needs of data producers and data users gathered from the combination of a March 2022 two-day on-line workshop and the nearly 30-year history of the NASA Earth Observing System Data and Information System (EOSDIS) Distributed Active Archive Centers (DAACs).  The workshop included over 100 participants, primarily from the United States, with the first day focusing on the needs of data users and the second day focusing on the needs of data producers.  Based on both the workshop and the collective experience of the DAACs, both data producers and data users benefit substantially when there is an early partnership between the research project producing the data and the data archive which will publish the data.  The highly heterogenous nature of airborne and field research data presents  particular challenges for discovery, particularly in the context of systems that are optimized for discovery and delivery of on-orbit Earth science data.  The DAACs experience also demonstrates an evolution of best practices for working with this kind of data.  However, systems which have been in operation for decades often have technical debt, which can constrain the evolution of the research data infrastructure.  The migration of EOSDIS into a commercial cloud environment presents several interesting opportunities for addressing the data producer and data user needs identified by the workshop and experience of the DAACs.  

How to cite: Wilson, B., Buzanowicz, M., Leon, A., Lubkin, S., Sinclair, L., Stano, G., Thornton, M., Tisdale, M., Walker, T., Wei, Y., and Wingo, S.: Data Producer and Data User Needs for Airborne and Field Earth Science Research Measurements, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14950, https://doi.org/10.5194/egusphere-egu23-14950, 2023.

X4.169
|
EGU23-15820
Matt Fry and Gareth Old

The UK is planning to implement a £38M (€43M) Flood and Drought Research Infrastructure to facilitate the hydrological science and innovation needed to underpin the UK’s adaptation and resilience to floods and droughts. The UKRI’s intent to invest from its infrastructure fund was published following a ~2-year scoping study that determined research community requirements through reviews of comparable infrastructures across the UK, Europe and globally, community workshops and questionnaires, and direct engagement with potential beneficiaries from research, industry and government bodies. Significantly, the scoping study identified the importance of a digital infrastructure to enable a step-change in access to hydrological monitoring data.  This would complement community access to the expected physical infrastructure for monitoring all phases of the water cycle across a range of catchment types.

Key requirements for the proposed digital infrastructure, to be delivered through 2023-2028, include access to UK-wide hydrological data alongside new catchment observatory data, supporting field monitoring and innovation through open digital systems, advancing the state of the art for sensor data management, linking monitoring activities more closely with research data archives and delivering support for open science. The digital infrastructure would leverage technological developments e.g. in cloud-based virtual research environments, and be delivered alongside a significant community capacity building effort to support cultural change and enable researchers to transform ways-of-working to maximise its potential benefits.

How to cite: Fry, M. and Old, G.: Design of a digital infrastructure for hydrological research, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15820, https://doi.org/10.5194/egusphere-egu23-15820, 2023.

X4.170
|
EGU23-15958
|
ECS
|
Mostafa Hadizadeh, Philipp S Sommer, Christof Lorenz, and Linda Baldewein

The Thematic Real-Time Environmental Distributed Data Services (THREDDS) Data Server is an open-source, Java-based web application that enables metadata and data access to scientific netCDF datasets. In recent years, more and more research institutes implemented THREDDS to give researchers and other end-users access to a wide range of real-time and archival data sets from earth system sciences. 

A number of features and interfaces are provided by THREDDS that facilitate the interactive and automated exploration, standardization and use of data like the automated generation of ISO-formatted metadata files or the provision of OGC-services (WMS and WCS). However, the configuration of THREDDS via XML-catalogs remains difficult and is usually restricted to system admins. And particularly the publication and consistent maintenance of a large number of datasets is prone to errors and hence proves to be difficult and time-consuming. 

Within the Model Data Explorer (MDE, https://model-data-explorer.readthedocs.io), a cross-institutional project to simplify a FAIR publication of model data on the web, we develop a module to overcome these configuration issues and enable scientists to make their environmental research data available on the web. This MDE-THREDDS module manages the catalogs and configurations of the THREDDS data server by providing a user-friendly web-interface for handling major components of THREDDS, including catalogs and web services. A flexible permission system enables scientists and other data producers to add and update their own datasets without the need for manually editing the underlying THREDDS catalogs. This permission system further allows server administrators to moderate and facilitate the publication of data on the web by scientists and other end-users which, hence, ensures a standardized and consistent THREDDS catalog infrastructure.

Overall, with MDE-THREDDS, we want to give scientists and other data producers a simple and user-friendly framework for making their research data open and FAIR through a wide range of standardized and well-established web interfaces.

How to cite: Hadizadeh, M., Sommer, P. S., Lorenz, C., and Baldewein, L.: MDE-Thredds: A Django-based plugin for managing THREDDS data server, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15958, https://doi.org/10.5194/egusphere-egu23-15958, 2023.

X4.171
|
EGU23-17052
|
Peter Konopatzky, Robin Heß, Roland Koppe, and Andreas Walter

The need for discoverability and accessibility of research data and metadata is huge, driven both by the FAIR principles and user requirements regarding research data portals, repositories and search engines. Interactive, visual and especially map-based exploration of research data is becoming increasingly popular. Bringing together technical reality and custom user vision in the design and provision of services for interactive map viewers without sacrificing sustainability can be challenging.

Thanks to our O2A Spatial framework, the classic Spatial Data Infrastructure (SDI) components — storage, database, geo web server, catalogue — can be (re)deployed and configured quickly and with low effort. This includes the creation and curation of data products which can be compiled from differently-sourced data. The list of currently supported data sources contains the PANGAEA repository, the Observations to Archives and Analysis (O2A) pipeline, Sensor Observation Services (SOS) and data provided by scientists directly. Simple metadata harmonisation is possible. Public available Standard Operating Procedures (SOPs) and data exchange specifications document the ways in which scientists and institutes can have their desired products hosted.

The modular, scalable, flexible and highly automated SDI has been developed and operated at Alfred Wegener Institute (AWI) for more than a decade, continuously improving and providing map services for GIS clients and portals including the Marine Data and Earth Data Portals (see ESSI4.1).

Long-term maintainability is ensured through the use of common open-source technologies, established geodata standards, containerisation and the high degree of automation. The modularity of O2A Spatial and SDI components ensures flexibility and future expandability. Being embedded into O2A, SDI development and operation is financially and staff-wise secured in the long run.

How to cite: Konopatzky, P., Heß, R., Koppe, R., and Walter, A.: A flexible yet sustainable Spatial Data Infrastructure for the Integration of Distributed Research Data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17052, https://doi.org/10.5194/egusphere-egu23-17052, 2023.

X4.172
|
EGU23-11116
Doris Maicher, Thorge Petersen, Hela Mehrtens, and Pina Springer

The International Generic Sample Number (IGSN) is a crucial tool for ensuring the traceability and preservation of physical specimens in the Earth Science community. As a persistent identifier (PID), IGSN serves as a link between published digital data and the physical samples stored in a repository, enabling the creation of synergies with other services through the harvesting of machine-readable data.

The IGSN can be assigned to a physical specimen at the time of collection, either on board a research vessel or during a field campaign. This unique identifier will follow the sample through the various stages of processing and analysis. In this use case, we demonstrate how IGSN can be minted for sediment cores directly on board a research vessel and then subsequently linked to relevant research data infrastructures (RDIs) such as DSHIP and PANGAEA. This allows for the traceability and easy identification of the samples as they are transported and stored in different repositories.

Incorporating IGSN into a RDI helps to broadcast the existence of physical material and makes it more easily discoverable by researchers. This is especially useful for marine field work, which can be expensive and may not be accessible to all researchers. By making information about samples available as open access, researchers are able to easily locate and reuse existing material, which can be particularly beneficial for smaller research projects or research communities with limited resources. This is especially relevant in times of crisis, when access to certain regions may be restricted and there is an increased demand for the reuse of existing samples.

In our research institutions, there is close collaboration between RDI providers and sample curators to manage both the digital data and the physical objects, such as plant samples in a herbarium, rocks and sediment cores, and biological material. In this presentation, we will use our case studies to discuss the successes of sample management in relation to IGSN. In addition, we will address the challenges that we have encountered and how we are working to overcome them. Our goal is to provide reliable services to our communities with a long-term perspective, and we believe that the incorporation of IGSN into RDIs can help to foster cultural change and encourage international collaboration in the Earth Science community. In addition, the use of IGSN and RDIs can contribute to the sustainability and reproducibility of research.

How to cite: Maicher, D., Petersen, T., Mehrtens, H., and Springer, P.: Leveraging IGSN to Enhance Data Management in Research Institutions, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11116, https://doi.org/10.5194/egusphere-egu23-11116, 2023.