EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Facilitating provenance documentation with a model-driven-engineering approach.

Lucy Bastin1, Owen Reynolds1, Antonio Garcia-Dominguez2, and James Sprinks3
Lucy Bastin et al.
  • 1Department of Computer Science, Aston University, Birmingham, UK (
  • 2Department of Computer Science, University of York, York, UK (
  • 3Earthwatch Europe, Oxford, UK (

Evaluating the quality of data is a major concern within the scientific community: before using any dataset for study, a careful judgement of its suitability must be conducted. This requires that the steps followed to acquire, select, and process the data have been thoroughly documented in a methodical manner, in a way that can be clearly communicated to the rest of the community. This is particularly important in the field of citizen science, where a project that can clearly demonstrate its protocols, transformation steps, and quality assurance procedures have much more chance of achieving social and scientific impact through the use and re-use of its data.

A number of specifications have been created to provide a common set of concepts and terminology, such as ISO 19115-3 or W3C PROV. These define a set of interchange formats, but in themselves, they do not provide tooling to create high-quality dataset descriptions. The existing tools built on these standards (e.g. GeoNetwork, USGS metadata wizard, CKAN) are overly complex for some users (for example, many citizen science project managers) who, despite being experts in their own fields, may be unfamiliar with the structure and context of metadata standards or with semantic modelling. 

In this presentation, we will describe a prototype authoring tool that was created using a Model-driven engineering (MDE) software development methodology. The tool was authored using JetBrains Meta Programming System (MPS) to implement a modelling language based on the ISO19115-3 model. A user is provided with a “text-like” editing environment, which assists with the formal structures needed to produce a machine-parable document.

This allows a user to easily describe data lineage and generic processing steps while reusing recognised external vocabularies with automated validation, autocompletion, and transformation to external formats (e.g. the XML format 19115-3 or JSON-LD). We will report on the results of user testing aimed at making the tool accessible to citizen scientists (through dedicated projections with simplified structures and dialogue-driven model creation) and evaluating with those users any new possibilities that comprehensive and machine-parsable provenance information may create for data integration and sharing. The prototype will also serve as a test pilot of the integration between ISO 19115-3 and existing/upcoming third-party vocabularies (such as the upcoming ISO data quality measures registry).

How to cite: Bastin, L., Reynolds, O., Garcia-Dominguez, A., and Sprinks, J.: Facilitating provenance documentation with a model-driven-engineering approach., EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-8321,, 2023.