EGU24-20519, updated on 11 Mar 2024
https://doi.org/10.5194/egusphere-egu24-20519
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

A pragmatic approach to complex citations, closing the provenance gap between IPCC AR6 figures and CMIP6 simulations

Charlotte Pascoe1, Martina Stockhause2, Graham Parton1, Ellie Fisher1, Molly MacRae1, Beate Kreuss2, and Lina Sitz3
Charlotte Pascoe et al.
  • 1CEDA Centre for Environmental Data Analysis, RAL Space, STFC, Harwell, United Kingdom
  • 2DKRZ German Climate Computing Center, Hamburg, Germany
  • 3Instituto de Fisica de Cantabria (IFCA/CSIC), Cantabria, Spain

Many of the figures in the WGI contribution to the IPCC Sixth Assessment report (AR6) are derived from the data of multiple CMIP6 simulations.  For instance, a plot showing projections of global temperature change in Figure 2 of Chapter 4 of the IPCC AR6 is based on data from 183 CMIP6 simulation datasets. The figure helpfully tells us which CMIP6 experiments were used as input data but does not provide information about the models that ran the simulations. It is possible to deduce the specific input data from supplementary tables in the IPCC assessment report and from within the report’s annexes.  However, these information sources are not machine-accessible so are difficult to use for tracing purposes, and they are not sufficient to give credit as they do not enter indexing services, and they are difficult to find as they are not part of the printed report. Even if we gather this knowledge to create a navigable provenance network for the figure, we are still left with the unwieldy prospect of rendering 183 data citations for an outwardly simple plot.

We require a compact way to provide traceable provenance for large input data networks that makes transparent the specific input data used to create the CMIP6-based figures in IPCC AR6 and gives credit to modelling centres for the effort of running the simulations. The so-called complex citation discussed within the RDA Complex Citation Working Group. 

We present a pragmatic solution to the complex citation challenge that uses an existing public infrastructure technology, Zenodo.  The work establishes traceability by collating references to a figure’s input datasets within a Zenodo record and credit via Zenodo’s relatedWorks feature/DataCite’s relations which link to existing data objects through Persistent Identifiers (PIDs), in this case the CMIP6 data citations.   Whilst a range of PIDs exist to support connection between objects, the use of DOIs is widely used for citations and is well connected within the wider PID graph landscape and Zenodo provides a tool to create objects that utilise the DOI schema provided by DataCite.  CMIP6 data citations have sufficient granularity to assign credit, but the granularity is not fine enough for traceability purposes, therefore Zenodo reference handle groups are used to identify specific input datasets and Zenodo connected objects provide the join between them.

There is still work to be done to establish full visibility of credit referenced within the Zenodo records.  However, we hope to engage the community by presenting our pragmatic solution to the complex citation challenge, one that has the potential to provide modelling centres with a route to a more complete picture of the impact of their simulations.

How to cite: Pascoe, C., Stockhause, M., Parton, G., Fisher, E., MacRae, M., Kreuss, B., and Sitz, L.: A pragmatic approach to complex citations, closing the provenance gap between IPCC AR6 figures and CMIP6 simulations, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20519, https://doi.org/10.5194/egusphere-egu24-20519, 2024.