- 1Bremen, Germany
- 2Bozeman, USA
As founders and former chief editors of Earth System Science Data (ESSD), the authors are concerned about the reproducibility and availability of important scientific sources and findings, and about timely access to scientific data and data-related services. We are discussing (1) incidents with the availability of DOIed datasets and their metadata and (2) a recent outage of an important data infrastructure.
Both observations are considered sufficiently serious that the authors wonder why the underlying facts and realities are not discussed widely in this community.
1) The most cited dataset published through ESSD is the series of yearly reports on the Global Carbon Budget, e.g. the latest, https://doi.org/10.5194/essd-2024-519. These articles are cited in scientific publications by the hundreds of times and routinely inform the United Nations climate change conferences (COPs). The first datasets of the series were held and provided DOIs by the Carbon Dioxide Information Analysis Center (CDIAC), which was hosted by the Oak Ridge National Laboratory. When CDIAC was shut down in 2017, the datasets were transferred to a repository at another US National Lab, loosing most of the metadata in the process, most notably authorship. Thankfully, hosting of post-2017 additions to the dataset series has been taken over by the Integrated Carbon Observing System (ICOS) and DOIs to all elements of the series still resolve (albeit, in a sloppy manner for pre-2018 data). One could argue that the most reliable holder of metainformation about this – not just scientifically – important data are not the repositories but ESSD, operated by a commercial publisher, Copernicus.
2) When tropical storm Helene hit North Carolina, in September 2024, power and internet connectivity went out from the Asheville headquarter site of NOAA’s NCEI, an aggregator, archive and service provider for environmental data. Although NCEI is hosted at four geographically dispersed sites, NCEI data ingest and services came to a halt for several weeks. It appears that most data from the period during and after Helene have been collected retroactively, and services are fully available again. While NOAA’s real-time weather services, important to deal with the emergency, seem to have been available during Helene, one is tempted to ask if they could become interrupted under similar circumstances.
Both these and some other observations – which will be discussed at EGU2025 - create the uncomfortable impression that the huge efforts of this community wrt. the FAIRness of data and in the creation of a multitude of publicly funded infrastructure elements do not achieve to meet today’s needs, and possibly may not meet them tomorrow. If government labs and agencies of a rich nation cannot achieve this – who can?
(Part of this work has been presented before, at a pre-conference workshop to RDA20, Gothenburg, 2023)
How to cite: Pfeiffenberger, H. and Carlson, D.: Are Publicly Funded Data-Infrastructures Reliable?, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-20550, https://doi.org/10.5194/egusphere-egu25-20550, 2025.