EGU25-9663, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-9663
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Wednesday, 30 Apr, 14:00–15:45 (CEST), Display time Wednesday, 30 Apr, 14:00–18:00
 
Hall X2, X2.44
NFS-FAIR-DDP  the data documentation procedure developed by NuoroForestrySchool as   open source tool to upgrade entry level data sharing by exploiting the SQL standard
Roberto Scotti, Filippo Giadrossich, and Agathe Casalta Badetti
Roberto Scotti et al.
  • NuoroForestrySchool-DipAGR-UniSS.it, Università di Sassari, Nuoro, Italy

CSV and Excel formats are among the most common storage formats for data sharing, especially in scientific and government contexts. Chaves-Fraga notes that a significant amount of public data is published in tabular formats such as CSV and Excel, which can hinder data accessibility and interoperability due to their lack of standardized metadata (Chaves-Fraga,  2020). This is in line with the findings of Burg et al. (2019). They highlight that although CSV files are widely used due to their simplicity, they often lack the necessary metadata to ensure data quality and provenance, which are crucial for compliance with the FAIR principles. Furthermore, Kaur et al. (2021) highlight that many health information systems allow data to be exported in CSV format, which is accessible but does not provide the semantic interoperability needed for effective data sharing and reuse. Furthermore, the limitations of CSV and Excel formats are compounded when datasets are converted to SQLite databases.

The NFS group (NuoroForestrySchool.io) has developed an open source Python-based application (https://gitlab.com/NuoroForestrySchool/nfs-data-documentation-procedure) that facilitates the organization of the data a researcher is willing to share. 

The application is designed to be used as a command line tool or through a graphical interface. It reads as input a spreadsheet file with one sheet for each table, plus an application-specific sheet defining the database schema, the data dictionary, the DataCite metadata, and other specific metadata (extended title, abstract/summary). The output of the procedure is represented by a SQLite file containing all the data and metadata, as well as an image of the graphical ERD-like schema, and a formal pdf document presenting the contents of the database. The SQLite file is a metadata-rich SQL-based database, taking full advantage of relational features and thus improving data accessibility, interoperability, and reusability by humans and machines.

The use of the procedure is demonstrated by processing a simple but significant use case.

LITERATURE

Chaves-Fraga, David, Edna Ruckhaus, Freddy Priyatna, Maria-Esther Vidal, e Oscar Corcho. 2021. «Enhancing virtual ontology based access over tabular data with Morph-CSV». A cura di Axel-Cyrille Ngonga Ngomo, Muhammad Saleem, Ruben Verborgh, Muhammad Saleem, Ruben Verborgh, Muhammad Intizar Ali, e Olaf Hartig. Semantic Web 12 (6): 869–902. https://doi.org/10.3233/SW-210432.
Kaur, Jasleen, Jasmine Kaur, Shruti Kapoor, e Harpreet Singh. 2021. «Design & Development of Customizable Web API for Interoperability of Antimicrobial Resistance Data». Scientific Reports 11 (1): 11226. https://doi.org/10.1038/s41598-021-90601-z.
Van Den Burg, G. J. J., A. Nazábal, e C. Sutton. 2019. «Wrangling Messy CSV Files by Detecting Row and Type Patterns». Data Mining and Knowledge Discovery 33 (6): 1799–1820. https://doi.org/10.1007/s10618-019-00646-y.

How to cite: Scotti, R., Giadrossich, F., and Casalta Badetti, A.: NFS-FAIR-DDP  the data documentation procedure developed by NuoroForestrySchool as   open source tool to upgrade entry level data sharing by exploiting the SQL standard, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9663, https://doi.org/10.5194/egusphere-egu25-9663, 2025.