Environmental data value stream as traceable linked data - Iliad Digital Twin of the Ocean case
- 1Open Geospatial Consortium, Innovation Program, Belgium
- 2Poznan Supercomputer and Networking Center, Poland
- 3SINTEF Ocean, Norway
- 4BLB, NOrway
- 5HubOcean, Norway
In the distributed heterogeneous environmental data ecosystems, the number of data sources, volume and variances of derivatives, purposes, formats, and replicas are increasingly growing. In theory, this can enrich the information system as a whole, enabling new data value to be revealed via the combination and fusion of several data sources and data types, searching for further relevant information hidden behind the variety of expressions, formats, replicas, and unknown reliability. It is now visible how complex data alignment is, and even more, it is not always justified due to capacity and business issues. One of the challenging, but also most rewarding approaches is semantic alignment, which promises to fill the information gap of data discovery and joins. To formalise one, an inevitable enabler is an aligned, linked, and machine readable data model enabling the specification of relations between data elements generated information. The Iliad - digital twins of the ocean are cases of this kind, where in-situ data and citizen science observations are mixed with multidimensional environmental data to enable data science and what-if models implementation and to be integrated into even broader ecosystems like the European Digital Twin Ocean (EDITO) and European Data Spaces. An Ocean Information Model (OIM) that will enable traversals and profiles is the semantic backbone of the ecosystem. Defined as the multi-level ontology, it will explain data using well known generic (Darwin Core, WoT), spatio-temporal (SOSA/SSN, OGC Geo, W3C Time, QUDT, W3C RDF Data Cube, WoT) and domain (WORMS, AGROVOC) ontologies. Machine readability and unambiguity allow for both automated validation and some translations.
On the other hand, efficient use of this requires yet another skill in data management and development besides GIS, ICT and domain expertise. In addition, as the semantics used in the data and metadata have not yet been stabilised on the implementation level, it introduces a few more flexibilities of data expression. Following the GEO data sharing and data management principles along with FAIR, CARE and TRUST, the environmental data is prepared for harmonisation. Furthermore, to ease the entry and to harmonise conventions, the authors introduce a multi-touchpoint data value chain API suite with an aligned approach to semantically enrich, entail and validate data sets such as observations streams in JSON or JSON-LD based on OIM, through storage and scientific data in NetCDF to exposing this semantically aligned data via the newly endorsed and already successful OGC Environmental Data Retrieval API. The practical approach is supported by a ready-to-use toolbox of components that presents portable tools to build and validate multi-source geospatial data integrations keeping track of the information added during mesh-up and predictions and what-if implementations.
How to cite: Zaborowski, P., Atkinson, R., Villar Fernandez, A., Palma, R., Brönner, U., Berre, A., Bye, B. L., Redd, T., and Voidrot, M.-F.: Environmental data value stream as traceable linked data - Iliad Digital Twin of the Ocean case, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-14237, https://doi.org/10.5194/egusphere-egu23-14237, 2023.