EGU25-13567, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-13567
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Tables as a way to deal with a variety of data formats and APIs in data spaces
Joan Masó1, Marta Olivé1, Alba Brobia1, Nuria Julia1, Nuria Cartell2, and Uta Wehn3
Joan Masó et al.
  • 1CREAF, Bellaterra (Barcelona), Spain (joan.maso@uab.cat)
  • 2NILU-Norwegian Institute for Air Research, Kjeller, Norway
  • 3IHE Delft Institute for Water Education, Delft, The Netherlands

The Green Deal Data Space is born in the big data paradigm where there is a variety of data formats and data models that are exposed as files or web APIs. As a result, we need to default in simple data structure that is transversal enough to be able to represent most of the more specific data models, formats and API payloads. Many data models present a structure that can be represented as tables.

TAPIS stands for "Tables from APIS". It is a JavaScript code that uses a common data model that is an array of objects with a list of properties that can contain a simple or a complex value. In TAPIS offers a series of operations that use one or more arrays of objects as inputs and produce a new array of objects as an output. There are operations that create the arrays of objects from files or API queries (a.k.a. data import), others that manipulate the objects (e.g. merge two arrays in a single one) and some operations that generate visual representations of the common data structure including tabular, a map, a graph, etc.

TAPIS is limited by its own data model. While many of the data models can be mapped to the common data model, a multidimensional data cube or a data tree cannot be represented in a single table in an efficient way. In the context of the Green Deal Data Space, most of the sensor data, statistical data, geospatial feature based data and administrative data can be considered object based data and can be used in TAPIS. TAPIS is able to connect to Sensor Things API (the sensor protocol selected in AD4GD and CitiObs), S3 buckets (the internal cloud repository used in AD4GD), GeoNetwork (the geospatial metadata catalogue selected in AD4GD and more4nature), and the OGC API features and derivates (the modern web API interfaces standardized by the OGC) but other data inputs will be incorporated, such as Citizen Science data sources and other popular APIs used in the more4nature project. More analytical functionalities are going to be incorporated in the CitiObs project. As part of the AD4GD Green Deal Information Model, there is an operation to associate semantics to each column of a table by linking it to a URI that defines the concept in an external vocabulary (as well as units of measure if appropriate). In order to be compatible with the data space architecture recommended by the International Data Space Association, we are working on supporting the catalogue of the Eclipse Data Connector, and to be able to negotiate a digital contract as a previous step to request access to the relevant data offered in the data space. To do so, we are working on incorporating the data space protocol as part of the TAPIS operations for data import. TAPIS is available as open source at https://github.com/joanma747/TAPIS.

AD4GD, CitiObs and more4nature are Horizon Europe projects co-funded by the European Union, Switzerland and the United Kingdom.

How to cite: Masó, J., Olivé, M., Brobia, A., Julia, N., Cartell, N., and Wehn, U.: Tables as a way to deal with a variety of data formats and APIs in data spaces, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13567, https://doi.org/10.5194/egusphere-egu25-13567, 2025.