ESSI3.9

Managing Geochemical Data from Field to Lab to Publication to Archive

ESSI3.9

Managing Geochemical Data from Field to Lab to Publication to Archive

Co-organized by GI2/GMPV1

Convener: Kirsten Elger | Co-conveners: Alexander Prent, Lesley Wyborn

vPICO presentations

| Fri, 30 Apr, 13:30–15:00 (CEST)

Public information:

Significant investments are made globally to study samples from the Earth, the Moon, and other planetary materials in research laboratories to extract new scientific insights about the history and state of our solar system. Expensive laboratory infrastructure and advanced instrumentation generates data at an ever increasing level of precision, resolution, and volume. This data needs to be efficiently managed and losslessly transferred from instruments in the lab, where the data are not accessible to others, to a “Collaboration” domain, where researchers can share and jointly analyze these data, to the “Public” domain, complete with all relevant information about the analytical process and uncertainty, and cross-references to originating samples and publications. Many solutions today are bespoke and inefficient, lacking, for example, unique identification of samples, instruments, and data sets needed to trace the analytical history of the data.

This session provides an overview on all facets of geochemical data management since the first “Editors Roundtable” in 2007, an initial meeting of editors, publishers, and database providers to implement consistent practices for reporting geochemical data in the literature or sharing these data in geochemical databases. What has happened since? Our presentations stretch from initiatives describing the full workflow support, to individual tools for data management in the lab, to specific data collections and data publication initiatives to the overarching aim of linking between systems and the need for standards.

vPICO presentations: Fri, 30 Apr

Chairpersons: Kerstin Lehnert, Kirsten Elger, Alexander Prent

13:30–13:35

5-minute convener introduction

13:35–13:45

EGU21-16420

solicited

Managing Open and FAIR Data in Geochemistry: Where are we a decade after the Editors Roundtable?

Steven L Goldstein, Kerstin Lehnert, and Albrecht W Hofmann

The ultimate goal of research data management is to achieve the long-term utility and impact of data acquired by research projects. Proper data management ensures that all researchers can validate and replicate findings, and reuse data in the quest for new discoveries. Research data need to be open, consistently and comprehensively documented for meaningful evaluation and reuse following domain-specific guidelines, and available for reuse via public data repositories that make them Findable, persistently Accessible, Interoperable, and Reusable (FAIR).

In the early 2000’s, the development of geochemical databases such as GEOROC and PetDB underscored that the reporting and documenting practices of geochemical data in the scientific literature were inconsistent and incomplete. The original data could often not be recovered from the publications, and essential information about samples, analytical procedures, data reduction, and data uncertainties was missing, thus limiting meaningful reuse of the data and reproducibility of the scientific findings. In order to avoid that such poor scientific practice might potentially damage the health of the entire discipline, we launched the Editors Roundtable in 2007, an initiative to bring together editors, publishers, and database providers to implement consistent publication practices for geochemical data. Recognizing that mainstream scientific journals were the most effective agents to rectify problems in data reporting and implement best practices, members of the Editors Roundtable created and signed a policy statement that laid out ‘Requirements for the Publication of Geochemical Data’ (Goldstein et al. 2014, http://dx.doi.org/10.1594/IEDA/100426). This presentation will examine the impact of this initial policy statement, assess the current status of best practices for geochemical data management, and explore what actions are still needed.

While the Editors Roundtable policy statement led to improved data reporting practices in some journals, and provided the basis for data submission policies and guidelines of the EarthChem Library (ECL), data reporting practices overall remained inconsistent and inadequate. Only with the formation of the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS, www.copdess.org), which extended the Editors Roundtable to include publishers and data facilities across the entire Earth and Space Sciences, along with the subsequent AGU project ‘Enabling FAIR Data’, has the implementation of new requirements by publishers, funders, and data repositories progressed and led to significant compliance with the FAIR Data Principles. Submission of geochemical data to open and FAIR repositories has increased substantially. Nevertheless, standard guidelines for documenting geochemical data and standard protocols for exchanging geochemical data among distributed data systems still need to be defined, and structures to govern such standards need to be identified by the global geochemistry community. Professional societies such as the Geochemical Society, the European Association of Geochemistry, and the International Association of GeoChemistry can and should take a leading role in this process.

How to cite: Goldstein, S. L., Lehnert, K., and Hofmann, A. W.: Managing Open and FAIR Data in Geochemistry: Where are we a decade after the Editors Roundtable?, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16420, https://doi.org/10.5194/egusphere-egu21-16420, 2021.

13:45–13:47

EGU21-10569

SciDataMover: Moving Geochemistry Data from the Lab through to Publication

Ryan Fraser, Samuel Boone, Alexander Prent, Jens Klump, and Guido Aben

The SciDataMover platform is a discipline- and scale-agnostic, lightweight, open source Data Movement Platform that transfers data, coupled with metadata from laboratories to shared workspaces then to repositories. The SciDataMover Platform leverages lightweight existing technologies that have a demonstrated capacity to be sustainably managed and can be affordably maintained.

Despite significant investments in analytical instruments in Australian research laboratories relevant to earth sciences and particularly geochemistry, there has been underinvestment in storage and efficient, lossless transfer of data from ‘Private’ lab instruments to ‘Collaboration’ domains where researchers can analyse and share data, and then persist it to trusted ‘Publication’ domains where researchers can persistently store the data that supports their scholarly publications.

SciDataMover is a FAIR data movement platform that enables data from instruments to move in a scalable and sustainable manner and comprises:

1) a data service to transfer data/metadata directly from instruments
2) collaboration areas to process, refine, standardise and share this data
3) a mechanism to transfer data supporting publications to a trusted repository (e.g., domain, institutional).

The Platform, being built off existing components will enable researchers to have readily available access to laboratory data when and where they need it, along with the ability to collaborate with colleagues even during a pandemic where physical distancing is required. The benefits of SciDataMover are long term persistence of laboratory-generated data (at various stages from minimally processed to final published form), greater collaboration efficiency and enhanced scientific reproducibility.

How to cite: Fraser, R., Boone, S., Prent, A., Klump, J., and Aben, G.: SciDataMover: Moving Geochemistry Data from the Lab through to Publication, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10569, https://doi.org/10.5194/egusphere-egu21-10569, 2021.

13:47–13:49

EGU21-13832

ECS

Implementing the Sparrow laboratory data system in multiple subdomains of geochronology and geochemistry

Daven Quinn, Benjamin Linzmeier, Kurt Sundell, George Gehrels, Simon Goring, Shaun Marcott, Stephen Meyers, Shanan Peters, Jake Ross, Mark Schmitz, Bradley Singer, and John Williams

Data sharing between laboratories is critical for building repeatable, comparable, and robust geochronology and geochemistry workflows. Meanwhile, in the broader geosciences, there is an increasing need for standardized access to aggregated geochemical data tied to basic geological context. Such data can be used to enrich sample and geochemical data repositories (e.g., EarthChem, Geochron.org, publisher archives), align geochemical context with other datasets that capture global change (e.g., Neotoma, the Paleobiology Database), and calibrate digital Earth models (e.g., Macrostrat) against geochronology-driven assessments of geologic time.

A typical geochemical lab manages a large archive of interpreted data; standardizing and contributing data products to community-level archives entails significant manual work that is not usually undertaken. Furthermore, without widely accepted interchange formats, this effort must be repeated for each intended destination.

Sparrow (https://sparrow-data.org), in development by a consortium of geochronology labs, is a standardized system designed to support labs’ efforts to manage, contextualize, and share their geochemical data. The system augments existing analytical workflows with tools to manage metadata (e.g., projects, sample context, embargo status) and software interfaces for automated data exchange with community facilities. It is extensible for a wide variety of geochemical methods and analytical processes.

In this update, we will report on the implementation of Sparrow in the Arizona Laserchron Center detrital zircon facility, and how that lab is using the system to capture geological context across its data archive. We will review similar integrations underway with U-Pb, ⁴⁰Ar/³⁹Ar, SIMS, optically stimulated luminescence, thermochronology, and cosmogenic nuclide dating. We will also discuss preliminary efforts to aggregate the output of multiple chronometers to refine age calibrations for the Macrostrat stratigraphic model.

How to cite: Quinn, D., Linzmeier, B., Sundell, K., Gehrels, G., Goring, S., Marcott, S., Meyers, S., Peters, S., Ross, J., Schmitz, M., Singer, B., and Williams, J.: Implementing the Sparrow laboratory data system in multiple subdomains of geochronology and geochemistry, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13832, https://doi.org/10.5194/egusphere-egu21-13832, 2021.

13:49–13:51

EGU21-3595

ECS

Advancing Data Curation and Archiving: an Application of Coding to Lab Management in the Geosciences

Tierney Latham, Catherine Beck, Bruce Wegter, and Ahra Wu

Increases in technology have rapidly advanced the capabilities and ubiquity of scientific instrumentation. Coupled with the demand for increased transparency and reproducibility in science, these advances have necessitated new systems of data management and archival practices. Laboratories are working to update their methods of data curation in line with these evolving best-practices, moving data from often disorderly private domains to publicly available, collaborative platforms. At the Hamilton Isotope Laboratory (HIL) of Hamilton College, the isotope ratio mass spectrometer (IRMS) is utilized across STEM disciplines for a combination of student, faculty, and course-related research, including both internal and external users. With over 200 sets of analytical runs processed in the past five years, documenting instrument usage and archiving the data produced is crucial to maintaining a state-of-the-art facility. However, previous to this project, the HIL faced significant barriers to proper data curation, storage, and accessibility including: a) data files were produced with variable format and nomenclature; b) data files were difficult to interpret without explanation from the lab technician; c) key metadata tying results to respective researchers and projects were missing; d) accessibility to data was limited due to storage on an individual computer; and e) data curation was an intellectual responsibility and burden for the lab technician. Additionally, as the HIL is housed within an undergraduate institution, the high rate of turnover for lab groups created additional barriers to the preservation of long-term, institutional knowledge, as students worked with the HIL for a year or less. These factors necessitate the establishment of new data management practices to ensure accessibility and longevity of scientific data and metadata. In this project, 283 Excel files of previously recorded data generated by the HIL IRMS were modified and cleaned to prepare data for submission to EarthChem, a public repository for geochemical data. Existing Excel files were manually manipulated, several original R code scripts were generated and employed, and procedures were established to backtrace projects and collect key metadata. Most critically, a new internal system of data collection was established with standardized nomenclature and framework. For future usage of the IRMS, data will be exported directly into a template compatible with EarthChem, thereby removing barriers for principal investigators (PIs) and research groups to archive their data in the public domain upon completion of their projects and publications.

How to cite: Latham, T., Beck, C., Wegter, B., and Wu, A.: Advancing Data Curation and Archiving: an Application of Coding to Lab Management in the Geosciences, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3595, https://doi.org/10.5194/egusphere-egu21-3595, 2021.

13:51–13:53

EGU21-6147

ECS

Petrological microscopy data workflow – an example from Cap de Creus, NE Spain

Richard Wessels, Thijmen Kok, Hans van Melick, and Martyn Drury

Publishing research data in a Findable, Accessible, Interoperable, and Reusable (FAIR) manner is increasingly valued and nowadays often required by publishers and funders. Because experimental research data provide the backbone for scientific publications, it is important to publish this data as FAIRly as possible to enable reuse and citation of the data, thereby increasing the impact of research.

The structural geology group at Utrecht University is collaborating with the EarthCube-funded StraboSpot initiative to develop (meta)data schemas, templates and workflows, to support researchers in collecting and publishing petrological and microstructural data. This data will be made available in a FAIR manner through the EPOS (European Plate Observing System) data publication chain (https://epos-msl.uu.nl/).

The data workflow under development currently includes: a) collecting structural field (meta)data compliant with the StraboSpot protocols, b) creating thin sections oriented in three dimensions by applying a notch system (Tikoff et al., 2019), c) scanning and digitizing thin sections using a high-resolution scanner, d) automated mineralogy through EDS on a SEM, and e) high-resolution geochemistry using a microprobe. The purpose of this workflow is to be able to track geochemical and structural measurements and observations throughout the analytical process.

This workflow is applied to samples from the Cap de Creus region in northeast Spain. Located in the axial zone of the Pyrenees, the pre-Cambrian metasediments underwent HT-LP greenschist- to amphibolite-facies metamorphism, are intruded by pegmatitic bodies, and transected by greenschist-facies shear zones. Cap de Creus is a natural laboratory for studying the deformation history of the Pyrenees, and samples from the region are ideal to test and refine the data workflow. In particular, the geochemical data collected under this workflow is used as input for modelling the bulk rock composition using Perple_X.

In the near future the workflow will be complimented by adding unique identifiers to the collected samples using IGSN (International Geo Sample Number), and by incorporating a StraboSpot-developed application for microscopy-based image correlation. This workflow will be refined and included in the broader correlative microscopy workflow that will be applied in the upcoming EXCITE project, an H2020-funded European collaboration of electron and x-ray microscopy facilities and researchers aimed at structural and chemical imaging of earth materials.

How to cite: Wessels, R., Kok, T., van Melick, H., and Drury, M.: Petrological microscopy data workflow – an example from Cap de Creus, NE Spain, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6147, https://doi.org/10.5194/egusphere-egu21-6147, 2021.

13:53–13:55

EGU21-16405

Collecting geochemical data of deep formation fluids for Geothermal Fluid Atlas for Europe

Katrin Kieling, Simona Regenspurg, Károly Kovács, Zsombor Fekete, Alberto Sánchez Miravalles, Tamás Madarász, and Éva Hartai

Most problems in deep geothermal operations are related to the chemistry of the geothermal fluid, which might cause deleterious physical and chemical reactions such as degassing and mineral precipitation or corrosion. However, data related the fluid properties are still scarce, largely as a consequence of the difficulty in determining these properties at in situ geothermal conditions, and the fact that those data are scattered across countries and often the “property” of commercial operators of geothermal power plants.

The EU H2020 project REFLECT aims to collect existing and new data on geothermal fluids across Europe through field measurements, detailed lab experiments simulating in situ conditions, and by calculations. These data will be implemented in case-specific predictive models simulating reactions at geothermal sites, as well as in a European geothermal Fluid Atlas.

To harmonize the metadata information for different fluid samples, REFLECT partners plan to register IGSNs (International Geo Sample Numbers) for fluid and reservoir rock samples collected and analysed within the project. The IGSN is a unique sample identifier, i.e. it is the equivalent to a DOI for publications. It was originally developed for drill cores and extended for various sample types, including fluid samples (seawater, river or lake water, hydrothermal fluids, porewater). Registration of fluid and rock samples with an IGSN will help to allow making the data accessible and re-usable even if the fluid sample itself is destroyed.

All data produced and collected within REFLECT form the base of the European Geothermal Fluid Atlas, which will include query and filtering tools to explore the database with a GIS based map visualization. The Atlas makes the data accessible to the geothermal community and the general public. The aim is to create a database, which can easily be integrated into other databases, such that the Fluid Atlas can be an addition to already existing initiatives of geological data collection.

How to cite: Kieling, K., Regenspurg, S., Kovács, K., Fekete, Z., Sánchez Miravalles, A., Madarász, T., and Hartai, É.: Collecting geochemical data of deep formation fluids for Geothermal Fluid Atlas for Europe, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16405, https://doi.org/10.5194/egusphere-egu21-16405, 2021.

13:55–13:57

EGU21-14458

ECS

Preserving High Value Legacy Collections for Future Research – The McNaughton Collection

Eleanore Blereau, Amanda Bellenger, and Brent McInnes

During his long career in ionprobe geochemistry, Professor Neal McNaughton built up an impressive collection of samples. Professor McNaughton served as SHRIMP geochronologist for the Centre of Global Metallogeny at the University of Western Australia (1994-2005), the Western Australia Centre for Exploration Targeting (2005-2007), and the John de Laeter Centre (JdLC) at Curtin University (2007-2019), and upon his retirement he donated his collection of epoxy mounted samples to the GSWA. This collection of over 1000 mounts containing over 4000 samples is full of irreplaceable samples, representing over 20 years of geochronological research and development on the SHRIMP II in the JdLC. The collection is a highly valuable resource for future geochemical and geochronological research however, the entire collection lacked a digital footprint. When this project started there was a distinct lack of a unified approach for geoscience metadata or a template for preserving such a collection. In a jointly funded effort by AuScope, GSWA and Curtin University a digital sample catalogue of the collection with digitised materials was successfully created. We operated under the FAIR data principals and utilised International Geo Sample Numbers (IGSNs) as persistent identifiers to create the most impactful, accessible and visible product. The final catalogue, associated metadata and digital materials are now publicly available online on a number of digital platforms such as Research Data Australia and GSWA’s GeoVIEW.WA and the mounts are able to be borrowed from GSWA for future analysis. These efforts allowed the preservation of physical materials for future loans and analysis as well as visibility in our digital age. We will outline the template and workflow utilised by this project that can be used to preserve similarly high value collections and by current facilities, universities and researchers in their ongoing research, as well as insights for future efforts.

How to cite: Blereau, E., Bellenger, A., and McInnes, B.: Preserving High Value Legacy Collections for Future Research – The McNaughton Collection, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14458, https://doi.org/10.5194/egusphere-egu21-14458, 2021.

13:57–13:59

EGU21-15037

How to turn kilos of mud into megabytes of data? 10 years of efforts in curating lake sediment cores and their associated results

Fabien Arnaud, Cécile Pignol, Bruno Galabertier, Xavier Crosta, Isabelle Billy, Elodie Godinho, Karim Bernardet, Pierre Sabatier, Anne-Lise Develle, Rosalie Bruel, Julien Penguen, Pascal Calvat, Pierre Stéphan, and Mathias Rouan

Here we present a series of connected efforts aiming at curating sediment cores and their related data. Far to be isolated, these efforts were conducted within national structured projects and led to the development of digital solutions and good practices in-line with international standards and practices.

Our efforts aimed at ensuring FAIR-compatible practices (Plomp, 2020; Wilkinson et al., 2016) throughout the life cycle of sediment cores, from fieldwork to published data. We adopted a step-by-step, bottom-up strategy to formalize a dataflow, mirroring our workflow. We hence created a fieldwork mobile application (CoreBook) to gather information during coring operations and inject them toward the French national virtual core repository “Cyber-Carothèque Nationale” (CCN). At this stage, the allocation of an international persistent unique identifier was crucial and we naturally chose the IGSN.

Beyond the traceability of samples, the curation of analysis data remains challenging. Most international repository (e.g. NOAA palaeo-data, PANGAEA) have taken the problem from the top by offering facilities to display published dataset with persistant unique identifier (DOI). Yet, those data are only a fraction of the gross amount of acquired data. Moreover, those repositories have very low requirements when it comes to the preservation and display of metadata, in particular analytical parameters, but also fieldwork data which are essential for data reusability. Finally, these repositories do not permit to get a synoptic view on the several strata of analyses that have been conducted on the same core through different research programs and publications. A partial solution is proposed by the eLTER metadata standard DEIMS, which offers a discovery interface of rich metadata. In order to bridge the gap between generalist data repositories and samples display systems (such as CCN, but also IMLGS, to cite an international system), we developed a data repository and visualizer dedicated to the re-use of lake sediment cores, samples and sampling locations (ROZA Retro-Observatory of the Zone Atelier). This system is still a prototype but opens yet interesting perspectives.

Finally, the digital evolution of science allows the worldwide diffusion of data processing freewares. In that framework, we developed “Serac” an open-source R package to establish radionuclide-based age models following the most common sedimentation hypotheses (serac,). By implementing within this R package the input of a rich metadata file that gathers links to IGSN and other quality metadata, we are linking fieldwork metadata, the physical storage of the core and the analytical metadata. Indeed, Serac also stores data processing procedure in a standardized way.. We hence think that the development of such softwares could help in the spreading of good practices in data curation and favour the use of unique identifiers.

By tackling all aspects of data creation and curation throughout a lake sediment core life cycle, we are now able to propose a theoretical model of data curation for this particular type of sample that could serve as the sole for further developments of integrated data curation systems.

How to cite: Arnaud, F., Pignol, C., Galabertier, B., Crosta, X., Billy, I., Godinho, E., Bernardet, K., Sabatier, P., Develle, A.-L., Bruel, R., Penguen, J., Calvat, P., Stéphan, P., and Rouan, M.: How to turn kilos of mud into megabytes of data? 10 years of efforts in curating lake sediment cores and their associated results, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15037, https://doi.org/10.5194/egusphere-egu21-15037, 2021.

13:59–14:01

EGU21-16550

AusGeochem and Big Data Analytics in Low-Temperature Thermochronology

Samuel Boone, Fabian Kohlmann, Moritz Theile, Wayne Noble, Barry Kohn, Stijn Glorie, Martin Danišík, and Renjie Zhou

The AuScope Geochemistry Network (AGN) and partners Lithodat Pty Ltd are developing AusGeochem, a novel cloud-based platform for Australian-produced geochemistry data from around the globe. The open platform will allow laboratories to upload, archive, disseminate and publish their datasets, as well as perform statistical analyses and data synthesis within the context of large volumes of publicly funded geochemical data. As part of this endeavour, representatives from four Australian low-temperature thermochronology laboratories (University of Melbourne, University of Adelaide, Curtin University and University of Queensland) are advising the AGN and Lithodat on the development of low-temperature thermochronology (LTT)-specific data models for the relational AusGeochem database and its international counterpart, LithoSurfer. These schemas will facilitate the structured archiving of a wide variety of thermochronology data, enabling geoscientists to readily perform LTT Big Data analytics and gain new insights into the thermo-tectonic evolution of Earth’s crust.

Adopting established international data reporting best practices, the LTT expert advisory group has designed database schemas for the fission track and (U-Th-Sm)/He methods, as well as for thermal history modelling results and metadata. In addition to recording the parameters required for LTT analyses, the schemas include fields for reference material results and error reporting, allowing AusGeochem users to independently perform QA/QC on data archived in the database. Development of scripts for the automated upload of data directly from analytical instruments into AusGeochem using its open-source Application Programming Interface are currently under way.

The advent of a LTT relational database heralds the beginning of a new era of Big Data analytics in the field of low-temperature thermochronology. By methodically archiving detailed LTT (meta-)data in structured schemas, intractably large datasets comprising 1000s of analyses produced by numerous laboratories can be readily interrogated in new and powerful ways. These include rapid derivation of inter-data relationships, facilitating on-the-fly age computation, statistical analysis and data visualisation. With the detailed LTT data stored in relational schemas, measurements can then be re-calculated and re-modelled using user-defined constants and kinetic algorithms. This enables analyses determined using different parameters to be equated and compared across regional- to global scales.

The development of this novel tool heralds the beginning of a new era of structured Big Data in the field of low-temperature thermochronology, improving laboratories’ ability to manage and share their data in alignment with FAIR data principles while enabling analysts to readily interrogate intractably large datasets in new and powerful ways.

How to cite: Boone, S., Kohlmann, F., Theile, M., Noble, W., Kohn, B., Glorie, S., Danišík, M., and Zhou, R.: AusGeochem and Big Data Analytics in Low-Temperature Thermochronology, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16550, https://doi.org/10.5194/egusphere-egu21-16550, 2021.

14:01–14:03

EGU21-3363

ECS

Ordination analyses in sedimentology, geochemistry and paleoenvironment - current trends and recommendations

Or Mordechay Bialik, Emilia Jarochowska, and Michal Grossowicz

Ordination is a family of multivariate exploratory data analysis methods. With the advent of high-throughput data acquisition protocols, community databases, and multiproxy studies, the use of ordination in Earth sciences has snowballed. As data management and analytical tools expand, this growing body of knowledge opens new possibilities of meta-analyses and data-mining across studies. This requires the analyses to be chosen adequately to the character of Earth science data, including pre-treatment consistent with the precision and accuracy of the variables, as well as appropriate documentation. To investigate the current situation in Earth sciences, we surveyed 174 ordination analyses in 163 publications in the fields of geochemistry, sedimentology and palaeoenvironmental reconstruction and monitoring. We focussed on studies using Principal Component Analysis (PCA), Non-Metric Multidimensional Scaling (NMDS) and Detrended Correspondence Analysis (DCA).

PCA was the most ubiquitous type of analysis (84%), with the other two accounting for ca. 12% each. Of 128 uses of PCA, only 5 included a test for normality, and most of these cases were not applied or documented correctly. Common problems include: (1) not providing information on the dimensions of the analysed matrix (16% cases); (2) using a larger number of variables than observations (24 cases); (3) not documenting the distance metric used in NMDS (55% cases); and (4) lack of information on the software used (38% cases). The majority (53%) of surveyed studies did not provide the data used for analysis at all and a further 35% provided data sets in a format that does not allow immediate, error-free reuse, e.g. as data table directly in the article text or in PDF appendix. The “golden standard” of placing a curated data set in an open access repository was followed only by 6 (3%) of the analyses. Among analyses which reported using code-based statistical environments such as R Software, SAS or SPSS, none provided the code that would allow reproducing the analyses.

Geochemical and Earth science data sets require expert knowledge which should support analytical decisions and interpretations. Data analysis skills attract students to Earth sciences study programmes and offer a viable research alternative when field- or lab-based work is limited. However, many study curricula and publishing process have not yet endorsed this methodological progress, leading to situations where mentors, reviewers and editors cannot offer quality assurance for the use of ordination methods. We provide a review of solutions and annotated R Software code for PCA, NMDA and DCA of geochemical data sets in the freeware R Software environment, encouraging the community to reuse and further develop a reproducible ordination workflow.

How to cite: Bialik, O. M., Jarochowska, E., and Grossowicz, M.: Ordination analyses in sedimentology, geochemistry and paleoenvironment - current trends and recommendations, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3363, https://doi.org/10.5194/egusphere-egu21-3363, 2021.

14:03–14:05

EGU21-10344

Managing geochemical data within the U.S. Geological Survey: An overview of policies and approaches from the perspective of the Energy Resources Program

Justin Birdwell, Christina DeVera, Katherine French, Steve Groves, Gregory Gunther, Aaron Jubb, Toni Rozinek, Augusta Warden, and Michele Wolf

The mission of the U.S. Geological Survey (USGS) Energy Resources Program (ERP) is to provide unbiased scientific information to stakeholders by conducting and disseminating research into energy-related issues mandated by the Administration or Congress or guided by ERP and USGS leadership. USGS Fundamental Science Practices (FSP) form the foundation for these efforts, representing a set of consistent procedures, ethical requirements, and operational principles that direct how research activities are conducted to ensure the highest standard of scientific integrity and transparency. Policies created to meet the goals of FSP guide how work is performed and how resulting information products are curated through the development, review, and approval processes. Though FSP have been a core part of the USGS mission since its inception, several new policies have been developed and implemented over the last decade related to data generation, management, and distribution to make practices, particularly those involving laboratory-generated geochemical data, more standardized and consistent across the USGS’ different scientific mission areas.

The ERP has been at the forefront of implementing these policies, particularly those that relate to laboratory-based science. For example, a new USGS-wide Quality Management System (QMS) was initially rolled out in ERP laboratories. QMS quality assurance requirements for laboratories were developed to ensure generation of data of known and documented quality and to support a culture of continuous improvement. QMS requirements include controls on sample receipt, login, and storage; documentation of data generation methods and standard operating procedures for sample preparation and analysis; and quality control procedures around equipment calibration and maintenance and data acceptance criteria. Many of the requirements are currently being met in the Petroleum Geochemistry Research Laboratory (PGRL) through the use of a laboratory information management system (LIMS) which provides a centralized storage location for data recording, reduction, review, and reporting. Samples processed by PGRL are identified from login to reporting by a unique lab-assigned number. Data are reviewed by the analyst, a secondary reviewer, and the laboratory manager before being accepted or considered qualified to address issues identified during analysis. A similar documentation approach is also applied to new research methods, experimental work, or modifications of existing processes.

Once reported to a submitter, geochemistry data are then interpreted and incorporated into USGS reports and other outside publications that are tracked using a single information product data system (IPDS). IPDS facilitates management of the internal review and approval processes for USGS information products. For geochemistry studies, data releases containing machine-readable laboratory-generated results along with associated metadata documentation typically accompany publications and have their own review and approval process. Once generated, data releases are given unique digital object identifiers for citation and access persistence, stored in Science Base, a Trusted Digital Repository for USGS products, and are made accessible through the USGS Science Data Catalog (https://data.usgs.gov). This collection of systems makes it possible for ERP personnel to collect, manage, and track geochemical data and facilitate the timely delivery of high-quality scientific publications and datasets to the public and support decision makers to manage domestic natural resources.

How to cite: Birdwell, J., DeVera, C., French, K., Groves, S., Gunther, G., Jubb, A., Rozinek, T., Warden, A., and Wolf, M.: Managing geochemical data within the U.S. Geological Survey: An overview of policies and approaches from the perspective of the Energy Resources Program, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10344, https://doi.org/10.5194/egusphere-egu21-10344, 2021.

14:05–14:07

EGU21-7876

ECS

How to publish your data with the EPOS Multi-scale Laboratories

Geertje ter Maat, Otto Lange, and Martyn Drury and the EPOS TCS Multi-scale Laboratories Team

EPOS (the European Plate Observing System) is a pan-European e-infrastructure framework with the goal of improving and facilitating the access, use, and re-use of Solid Earth science data. The EPOS Thematic Core Service Multi-scale Laboratories (TCS MSL) represent a community of European Solid Earth sciences laboratories including high-temperature and high-pressure experimental facilities, electron microscopy, micro-beam analysis, analogue tectonic and geodynamic modelling, paleomagnetism, and analytical laboratories.

Participants and collaborating laboratories from Belgium, Bulgaria, France, Germany, Italy, Norway, Portugal, Spain, Switzerland, The Netherlands, and the UK are already represented within the TCS MSL. Unaffiliated European Solid Earth sciences laboratories are welcome and encouraged to join the growing TCS MSL community.

Laboratory facilities are an integral part of Earth science research. The diversity of methods employed in such infrastructures reflects the multi-scale nature of the Earth system and is essential for the understanding of its evolution, for the assessment of geo-hazards, and the sustainable exploitation of geo-resources.

Although experimental data from these laboratories often provide the backbone for scientific publications, they are often only available as images, graphs or tables in the text or as supplementary information to research articles. As a result, much of the collected data remains unpublished, not searchable or even inaccessible, and often only preserved in the short term.

The TCS MSL is committed to making Earth science laboratory data Findable, Accessible, Interoperable, and Reusable (FAIR). For this purpose, the TCS MSL encourages the community to share their data via DOI-referenced, citable data publications. To facilitate this and ensure the provision of rich metadata, we offer user-friendly tools, plus the necessary data management expertise, to support all aspects of data publishing for the benefit of individual lab researchers via partner repositories. Data published via TCS MSL are described with the use of sustainable metadata standards enriched with controlled vocabularies used in geosciences. The resulting data publications are also exposed through a designated TCS MSL online portal that brings together DOI-referenced data publications from partner research data repositories (https://epos-msl.uu.nl/). As such, efforts have already been made to interconnect new data (metadata exchange) with previous databases such as MagIC (paleomagnetic data in Earthref.org), and in the future, we expect to enlarge and improve this practice with other repositories.

How to cite: ter Maat, G., Lange, O., and Drury, M. and the EPOS TCS Multi-scale Laboratories Team: How to publish your data with the EPOS Multi-scale Laboratories, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7876, https://doi.org/10.5194/egusphere-egu21-7876, 2021.

14:07–14:09

EGU21-16459

EarthChem Communities: Building Geochemical Data Best Practices with Researcher Engagement

Lucia Profeta, Kerstin Lehnert, Lulin Song, and Juan David Figueroa

Acquisition and use of geochemical data are pervasive in the Earth, Environmental and Planetary Sciences as they are fundamental to our understanding of past, present, and future processes in natural systems, from the interior of the Earth to its surface environments on land, in the oceans, and in the air, to the entire solar system. Accordingly, the range of research communities that generate and use geochemical data is quite extensive. Data practices and workflows for processing, reporting, sharing, and using data are numerous and distinct for different research communities. Furthermore, the type of data generated is highly diverse with respect to analyzed parameters, analyzed materials, analytical techniques and instrumentation, as well as volume, size, and format. This makes it difficult to define generally applicable best practices and standards for geochemical data that the entire range of geochemical data communities will adopt. While it is technically possible to describe and encode the large variety of geochemical measurements in a consistent, unifying way provided by the Observations and Measurements conceptual model (https://www.ogc.org/standards/om), communities need to build consensus around specifics in data formats, metadata, and vocabularies, and most importantly, they need to ‘own’ the best practices to ensure adoption.

EarthChem is a data facility for geochemistry, funded by the US National Science Foundation since 2006, to develop and operate community-driven services that support the discovery, access, preservation, reusability, and interoperability of geochemical data. EarthChem has a long record of engaging with the global research community to develop and promote data best practices for geochemistry by, for example, initiating and helping to organize the Editors Roundtable (Goldstein et al. 2014, http://dx.doi.org/10.1594/IEDA/100426). In recent years, as researchers have become increasingly aware of the benefits and requirements of FAIR data management, EarthChem has supported research communities wanting to establish consistent data formats and rich metadata for better findability and reproducibility of specific data types acquired and used within these communities. EarthChem now works with community advisers to build consensus around data best practices, provide resources for researchers to comply with these best practices, and streamline data submission and data access for these communities. EarthChem provides Community web pages as spaces to explain community-specific best practices, offer downloadable data templates, and link to customized community portals for data submission and access. EarthChem is in the process of defining guidelines and policies that will ensure that the best practices and data templates promoted by an EarthChem Community are indeed community endorsed. By making sure that the community-specific best practices align with more general data standards such as the elements of the O&M conceptual data model or the use of globally unique identifiers for samples, EarthChem Communities can advance overarching data best practices and standards that will improve reusability of geochemical data and data exchange among distributed databases. Initial EarthChem Communities include Tephra, Clumped Isotopes, and Experimental Petrology. Additional communities such as GeoHealth and Laser Induced Breakdown Spectroscopy are currently in an exploratory stage.

How to cite: Profeta, L., Lehnert, K., Song, L., and Figueroa, J. D.: EarthChem Communities: Building Geochemical Data Best Practices with Researcher Engagement, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16459, https://doi.org/10.5194/egusphere-egu21-16459, 2021.

14:09–14:11

EGU21-2521

ECS

Automation of (meta-)data workflows from field to data repository

Linda Baldewein, Ulrike Kleeberg, and Lars Möller

In Earth and environmental sciences data analyzed from field samples are a significant portion of all research data, oftentimes collected under significant costs and non-reproducibly. If important metadata is not immediately secured and stored in the field, the quality and re-usability of the resulting data will be diminished.

At the Helmholtz Coastal Data Center (HCDC) a metadata and data workflow for biogeochemical data has been developed over the last couple of years to ensure the quality and richness of metadata and enable that the final data product will be FAIR. It automates and standardizes the data transfer from the campaign planning stage, through sample collection in the field, analysis and quality control to the storage into databases and the publication in repositories.

Prior to any sampling campaign, the scientists are equipped with a customized app on a tablet that enables them to record relevant metadata information, such as the date and time of sampling, the involved scientists and the type of sample collected. Each sample and station already receives a unique identifier at this stage. The location is directly retrieved from a high-accuracy GNSS receiver connected to the tablet. This metadata is transmitted via mobile data transfer to the institution’s cloud storage.

After the campaign, the metadata is quality checked by the field scientists and the data curator and stored in a relational database. Once the samples are analyzed in the lab, the data is imported into the database and connected to the corresponding metadata using a template. Data DOIs are registered for finalized datasets in close collaboration with the World Data Center PANGAEA. The data sets are discoverable through their DOIs as well as through the HCDC data portal and the API of the metadata catalogue service.

This workflow is well established within the institute, but is still in the process of being refined and becoming more sophisticated and FAIRer. For example, an automated assignment of International Geo Sample Numbers (IGSN) for all samples is currently being planned.

How to cite: Baldewein, L., Kleeberg, U., and Möller, L.: Automation of (meta-)data workflows from field to data repository, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2521, https://doi.org/10.5194/egusphere-egu21-2521, 2021.

14:11–14:13

EGU21-14628

ECS

The AuScope Geochemistry Network and the AusGeochem geochemistry data platform

Alexander Prent, Hayden Dalton, Samuel Boone, Guillaume Florin, Yoann Greau, Brent McInnes, Andrew Gleadow, Suzanne O'Reilly, Barry Kohn, Erin Matchan, Olivier Alard, Tim Rawling, Fabian Kohlmann, Moritz Theile, and Wayne Noble

The AuScope Geochemistry Network (AGN, www.auscope.org.au/agn) was established in 2019 in response to a community expressed desire for closer collaboration and coordination of activities between Australian geochemistry laboratories. Its aims include: i) promotion of capital and operational investments in new, advanced geochemical infrastructure; (ii) supporting increased end user access to laboratory facilities and research data; (iii) fostering collaboration and professional development via online tools, training courses and workshops. Over the last six months, the AGN has coordinated a monthly webinar series to engage the geoscience community, promote FAIR data practices and foster new collaborations. These webinars were recorded for future use and can be found at: www.youtube.com/channel/UC0zzzc6_mrJEEdCS_G4HYgg.

A primary goal of the AGN is to make the networks’ laboratory geochemistry data, from around the globe, discoverable and accessible via development of an online data platform called AusGeochem (www.auscope.org.au/ausgeochem). Geochemical data models for SHRIMP U-Pb, Fission Track, U-Th/He, LA-ICP-MS U-Pb/Lu-Hf and Ar-Ar are being developed using international best practice and are informed by expert advisory groups consisting of members from various institutes and laboratories within Australia. AusGeochem is being designed to provide an online data service for analytical laboratories and researchers where sample and analytical data can be uploaded (privately) for processing, synthesis and secure dissemination to collaborators. Researcher data can be retained in a private space but studied within the context of other publicly available data. Researchers can also generate unique international geo sample numbers (IGSNs) for their samples via a build in link to the Australian Research Data Commons IGSN registry.

AusGeochem supports FAIR data practices by providing researchers with the ability to include links to their AusGeochem registered data in research publications, providing a potential opportunity for AusGeochem to become a trusted data repository.

How to cite: Prent, A., Dalton, H., Boone, S., Florin, G., Greau, Y., McInnes, B., Gleadow, A., O'Reilly, S., Kohn, B., Matchan, E., Alard, O., Rawling, T., Kohlmann, F., Theile, M., and Noble, W.: The AuScope Geochemistry Network and the AusGeochem geochemistry data platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14628, https://doi.org/10.5194/egusphere-egu21-14628, 2021.

14:13–14:15

EGU21-13940

Linking data systems into a collaborative pipeline for geochemical data from field to archive

Kerstin Lehnert, Daven Quinn, Basil Tikoff, Douglas Walker, Sarah Ramdeen, Lucia Profeta, Shanan Peters, and Jonathan Pauli

Management of geochemical data needs to consider the sequence of phases in the lifecycle of these data from field to lab to publication to archive. It also needs to address the large variety of chemical properties measured; the wide range of materials that are analyzed; the different ways, in which these materials may be prepared for analysis; the diversity of analytical techniques and instrumentation used to obtain analytical results; and the many ways used to calibrate and correct raw data, normalize them to standard reference materials, and otherwise treat them to obtain meaningful and comparable results. In order to extract knowledge from the data, they are then integrated and compared with other measurements, formatted for visualization, statistical analysis, or model generation, and finally cleaned and organized for publication and deposition in a data repository. Each phase in the geochemical data lifecycle has its specific workflows and metadata that need to be recorded to fully document the provenance of the data so that others can reproduce the results.

An increasing number of software tools are developed to support the different phases of the geochemical data lifecycle. These include electronic field notebooks, digital lab books, and Jupyter notebooks for data analysis, as well as data submission forms and templates. These tools are mostly disconnected and often require manual transcription or copying and pasting of data and metadata from one tool to the other. In an ideal world, these tools would be connected so that field observations gathered in a digital field notebook, such as sample locations and sampling dates, can be seamlessly send to an IGSN Allocating Agent to obtain a unique sample identifier with a QR code with a single click. The sample metadata would be readily accessible for the lab data management system that allows the researchers to capture information about the sample preparation, and that connects to the instrumentation to capture instrument settings and the raw data. The data would then be seamlessly accessed by data reduction software, visualized, and further compared to data from global databases that can be directly accessed. Ultimately, a few clicks will allow the user to format the data for publication and archiving.

Several data systems that support different stages in the lifecycle of samples and sample-based geochemical data have now come together to explore the development of standardized interfaces and APIs and consistent data and metadata schemas to link their systems into an efficient pipeline for geochemical data from the field to the archive. These systems include StraboSpot (www.strabospot.org; data system for digital collection, storage, and sharing of both field and lab data), SESAR (www.geosamples.org; sample registry and allocating agent for IGSN), EarthChem (www.earthchem.org; publishers and repository for geochemical data), Sparrow (sparrow-data.org; data system to organize analytical data and track project- and sample-level metadata), IsoBank (isobank.org; repository for stable isotope data), and MacroStrat (macrostrat.org; collaborative platform for geological data exploration and integration).

How to cite: Lehnert, K., Quinn, D., Tikoff, B., Walker, D., Ramdeen, S., Profeta, L., Peters, S., and Pauli, J.: Linking data systems into a collaborative pipeline for geochemical data from field to archive, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13940, https://doi.org/10.5194/egusphere-egu21-13940, 2021.

14:15–15:00

Meet the authors in their breakout text chats