EGU23-7786
https://doi.org/10.5194/egusphere-egu23-7786
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Constructing a Searchable Knowledge Repository for FAIR Climate Data

Mark Roantree1, Branislava Lalić2, Stevan Savić3, Dragan Milošević3, and Michael Scriney4
Mark Roantree et al.
  • 1Insight Centre for Data Analytics, Dublin City University, Dublin, Ireland (mark.roantree@dcu.ie)
  • 2Faculty of Agriculture, University of Novi Sad, Serbia
  • 3Climatology and Hydrology Research Centre, University of Novi Sad, Serbia
  • 4School of Computing, Dublin City University, Dublin, Ireland

The development of a knowledge repository for climate science data is a multidisciplinary effort between the domain experts (climate scientists), data engineers who's skills include design and building a knowledge repository, and machine learning researchers who provide expertise on data preparation tasks such as gap filling and advise on different machine learning models that can exploit this data.

One of the main goals of the CA20108 cost action is to develop a knowledge portal that is fully compliant with the FAIR principles for scientific data management. In the first year, a bespoke knowledge portal was developed to capture metadata for FAIR datasets. Its purpose was to provide detailed metadata descriptions for shareable micro-meteorological (micromet) data using the WMO standard. While storing Network, Site and Sensor metadata locally, the system passes the actual data to Zenodo, receives back the DOI and thus, creates a permanent link between the Knowledge Portal and the storage platform Zenodo. While the user searches the Knowledge portal (metadata), results provide both detailed descriptions and links to data on the Zenodo platform. Our adherence to FAIR principles are documented below:

  • Findable. Machine-readable metadata is required for automatic discovery of datasets and services. A metadata description is supplied by the data owners for all micro-meteorological data shared on the system which subsequently drives the search engine, using keywords or network, site and sensor search terms.
  • Accessible. When suitable datasets have been identified, access details should be provided. Assuming data is freely accessible, Zenodo DOIs and links are provided for direct data access.
  • Interoperable. Data interoperability means the ability to share and integrate data from different users and sources. This can only happen if a standard (meta)data model is employed to describe data, an important concept which generally requires data engineering skills to deliver. In the knowledge portal presented here, the WMO guide provides the design and structure for metadata.    
  • Reusable. To truly deliver reusability, metadata should be expressed in as detailed a manner as possible. In this way, data can be replicated and integrated according to different scientific requirements. While the Knowledge Portal facilitates very detailed metadata descriptions, not all metadata is compulsory as it was accepted that in some cases, the overhead in providing this information can be very costly. 

Simple analytics are in place to monitor the volume and size of networks in the system. Current metrics include: network count; average size of network (number of sites); dates and size of datasets per network/site; numbers and types of sensors in each site, etc. The current Portal is in Beta version meaning that the system is currently functional but open only to members of the Cost Action who are nominated testers. This status is due to change in Q1/2023 when access will be open to the wider climate science community.  

Current plans include new Tools and Services to assess the quality of data, including the level of gaps and in some cases, machine learning tools will be provided to attempt gap filling for datasets meeting certain requirements.

 

How to cite: Roantree, M., Lalić, B., Savić, S., Milošević, D., and Scriney, M.: Constructing a Searchable Knowledge Repository for FAIR Climate Data, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-7786, https://doi.org/10.5194/egusphere-egu23-7786, 2023.

Supplementary materials

Supplementary material file