EGU25-16237, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-16237
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Tuesday, 29 Apr, 19:25–19:55 (CEST)
 
Room G1
Rethinking HOW We Create Global Networks of Earth and Environmental Datasets to Maximise Their Potential to Underpin Integrated Research for a Sustainable Planet.
Lesley Wyborn
Lesley Wyborn
  • Australian National University, National Computational Infrastructure, Acton, Australia (lesley.wyborn@anu.edu.au)

Earth System Science datasets have been acquired for centuries across five broad spheres: geosphere, cryosphere, hydrosphere, biosphere and atmosphere. They vary from human observations to sensor-derived measurements ranging from nanoscale laboratory data to large-volume petascale datasets collected remotely by satellites, drones, etc. Across all spheres most datasets have their roots in three core disciplines: Geology, Geophysics and Geochemistry. Today we are generating unprecedented volumes of data and when combined with computer capacity, now at exascale, our capability to integrate and analyse data should be unparalleled.

Digital data repositories emerged around 1980 and the internet soon after. Initially data was shared by shipping on hard media. The internet soon enabled globally data sharing data, including by web services (e.g., OneGeology In 2008). Multiple global data sharing networks were envisioned, but few moved beyond those that proposed them. Machine-to-machine data sharing is still a challenge. Many spheres cannot utilise the existing capacity of computers, including the full potential of AI applications, because these cannot read the volumes of available data. 

History has repeatedly shown that revolutionary infrastructures can take decades to realise their full potential and change from being a new way of doing things to multiple ways of doing new things. 

The FAIR principles were specifically designed to increase machine-to-machine interoperability of data: they are the blueprint of WHAT needs to be done but the HOW will involve rethinking 3 key steps. 

Firstly, shift the onus on aggregating data from the consumer to repositories capable of implementing discipline-centric FAIR (meta)data standards. 

Secondly, as recommended by the WorldFAIR Second Policy Brief to the European Open Science Cloud (EOSC), change from a bibliographic approach to data stewardship to one of data engineering, where richer and more comprehensive standardised (meta)data at the datum level enables machine-to-machine access of specific variables of interest across multiple disciplinary datasets. Take a more holistic approach to standards development (e.g., Observation, Measurement and Samples Standard (ISO 19156:2023)) and identify common universals across disciplines (e.g., time, place, units of measure). Initiatives like OneGeology and Geochemistry and hopefully soon OneGeophysics can support higher--level discipline centric (meta)data standards. Standards coordination groups (e.g., CODATA, Research Data Alliance) are critical. PIDS at the object level will be essential.

Thirdly, prioritise which datasets are made fully FAIR compliant and fund their curation in repositories that offer discipline based curation. The 2019 Beijing Declaration on Research Data notes that ‘publicly funded research data should be interoperable, and preferably without further manipulation or conversion, to facilitate their broad reuse in scientific research’. The myriad of data products generated from these primary data sources can go to generalist and institutional repositories.

Revolutionary infrastructures do take time to realise their full potential. It is nearly 25 years since the early experiments using the internet to globally network data repositories. The WorldFAIR Second EOSC Policy Brief emphasises that the change to machine-actionable FAIR data ‘is one of a magnitude which will necessitate considerable resourcing, investment, and upskilling; but it will also achieve significant benefits, including creating a digitally integrated Earth to support sustainable development of our planet.

How to cite: Wyborn, L.: Rethinking HOW We Create Global Networks of Earth and Environmental Datasets to Maximise Their Potential to Underpin Integrated Research for a Sustainable Planet., EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16237, https://doi.org/10.5194/egusphere-egu25-16237, 2025.