SC5.1 | Communicating data quality through open reproducible research
Communicating data quality through open reproducible research
Co-organized by CL6/ESSI3/GM12/NH12/OS5
Convener: Markus Konkol | Co-convener: Simon Jirka
Tue, 25 Apr, 14:00–15:45 (CEST)
Room -2.85/86
Tue, 14:00
Policies and decisions are often based on data products, such as dynamic maps and time series. The underlying data is ideally of high quality, but generating complete and accurate data is often a costly endeavour. Integrating sparse accurate sensors and low-cost instruments is a way to overcome this issue but it results in challenges related to interoperability. Moreover, the quality of combined data and how the resulting data product (e.g., a map showing an interpolation) is generated needs to be communicated transparently to users. An aggravating factor is that quality is not an absolute indicator but might depend on the use case and other factors (e.g, accuracy/precision of the sensors, deployment, data management). A computational notebook (e.g., R Markdown) can help to communicate how the quality of a dataset and the data product are calculated. For example, the notebook can show which observations are included/excluded in a map showing an interpolation.
In this short course, we will show how reproducible computational notebooks can help to communicate information on data quality effectively and transparently allowing users to understand, verify, and build on top of shareable workflows. To achieve that, we will demonstrate a use case from the EU-funded project MINKE on how the cooperation between the metrology and the oceanographic community can lead to an improved data reliability and use to address wicked problems related to “Life below water” (SDG 14). MINKE focuses on data quality and interoperability and aims to improve the use of existing research infrastructures and stimulate collaborations across research fields and citizen science.
In this hands-on course, we will apply tools to publish reproducible research, including R, R Markdown, Binder, and git. Furthermore, we will touch upon issues related to the computational environment and data management, thus covering Open Science principles (e.g., open code and data). This course is open to everyone interested in reproducibility of R-based workflows. We invite participants to follow the use case on their laptops and experiment with the computational workflow. Basic knowledge in R is needed, whereas knowledge in the other technologies is recommended but optional. The workflows will be reproducible in the browser. While the use case is from MINKE, the reproducibility concepts are applicable to other scenarios based on computational workflows.
