- NuoroForestrySchool-DipAGR-UniSS.it, Università di Sassari, Nuoro, Italy
CSV and Excel formats are among the most common storage formats for data sharing, especially in scientific and government contexts. Chaves-Fraga notes that a significant amount of public data is published in tabular formats such as CSV and Excel, which can hinder data accessibility and interoperability due to their lack of standardized metadata (Chaves-Fraga, 2020). This is in line with the findings of Burg et al. (2019). They highlight that although CSV files are widely used due to their simplicity, they often lack the necessary metadata to ensure data quality and provenance, which are crucial for compliance with the FAIR principles. Furthermore, Kaur et al. (2021) highlight that many health information systems allow data to be exported in CSV format, which is accessible but does not provide the semantic interoperability needed for effective data sharing and reuse. Furthermore, the limitations of CSV and Excel formats are compounded when datasets are converted to SQLite databases.
The NFS group (NuoroForestrySchool.io) has developed an open source Python-based application (https://gitlab.com/NuoroForestrySchool/nfs-data-documentation-procedure) that facilitates the organization of the data a researcher is willing to share.
The application is designed to be used as a command line tool or through a graphical interface. It reads as input a spreadsheet file with one sheet for each table, plus an application-specific sheet defining the database schema, the data dictionary, the DataCite metadata, and other specific metadata (extended title, abstract/summary). The output of the procedure is represented by a SQLite file containing all the data and metadata, as well as an image of the graphical ERD-like schema, and a formal pdf document presenting the contents of the database. The SQLite file is a metadata-rich SQL-based database, taking full advantage of relational features and thus improving data accessibility, interoperability, and reusability by humans and machines.
The use of the procedure is demonstrated by processing a simple but significant use case.
LITERATURE
How to cite: Scotti, R., Giadrossich, F., and Casalta Badetti, A.: NFS-FAIR-DDP the data documentation procedure developed by NuoroForestrySchool as open source tool to upgrade entry level data sharing by exploiting the SQL standard, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9663, https://doi.org/10.5194/egusphere-egu25-9663, 2025.