EGU23-6831
https://doi.org/10.5194/egusphere-egu23-6831
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Tracking and reporting peta-scale data exploitation within the Earth System Grid Federation through the ESGF Data Statistics service

Alessandra Nuzzo1, Fabrizio Antonio1, Maria Mirto1, Paola Nassisi1, Sandro Fiore2, and Giovanni Aloisio1
Alessandra Nuzzo et al.
  • 1Centro Euro-Mediterraneo sui Cambiamenti Climatici (CMCC), Lecce, Italy
  • 2University of Trento, Trento, Italy

The Earth System Grid Federation (ESGF) is an international collaboration powering most global climate change research and managing the first-ever decentralized repository for handling climate science data, with multiple petabytes of data at dozens of federated sites worldwide. It is recognized as the leading infrastructure for the management and access of large distributed data volumes for climate change research and supports the Coupled Model Intercomparison Project (CMIP) and the Coordinated Regional Climate Downscaling Experiment (CORDEX), whose protocols enable the periodic assessments carried out by the IPCC, the Intergovernmental Panel on Climate Change.

 

As trusted international repository, ESGF hosts and replicates data from a broader range of domains and communities in the Earth sciences leading thus to a strong support to standards for connecting data and application of FAIR data principles to ensure free and open access and interoperability with other similar systems in the Earth Sciences.

 

ESGF includes a specific software component, funded by the H2020 projects IS-ENES2 and IS-ENES3, named ESGF Data Statistics, which takes care of collecting, analyzing, visualizing the data usage metrics and data archive information across the federation.

 

It provides a distributed and scalable software infrastructure responsible for capturing a set of metrics both at single site and federation level. It collects and stores a high volume of heterogeneous metrics, covering coarse and fine grain measures such as downloads and clients statistics, aggregated cross and project-specific download statistics thus offering a more user  oriented perspective of the scientific experiments.

 

This allows providing a strong feedback on how much, how frequently and how intensively the whole federation is exploited by the end-users, as well as the most downloaded data, which somehow captures the level of interest from the community on some specific data. It also gives feedback on the less accessed data, which from one side can help designing larger-scale experiments in the future and on the other hand can help getting some insights on the long tail of research. On top of this, a view of the total amount of data published and available through ESGF offers users the possibility to monitor the status of the data archive of the entire federation. 

This contribution presents an overview of the Data Statistics capabilities as well as the main results in terms of data analysis and visualization.

How to cite: Nuzzo, A., Antonio, F., Mirto, M., Nassisi, P., Fiore, S., and Aloisio, G.: Tracking and reporting peta-scale data exploitation within the Earth System Grid Federation through the ESGF Data Statistics service, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-6831, https://doi.org/10.5194/egusphere-egu23-6831, 2023.

Supplementary materials

Supplementary material file