Operationalizing Data Fitness-for-Purpose Through Standardized Metrics, Local Uncertainty, and LLM-Extracted Quality Reasoning&nbsp;

Markus Möller; Mahdi Hedayat Mahmoudi; Paul Peschel

doi:https://doi.org/10.5194/egusphere-egu26-18803

[Back] [Session ESSI3.2]

EGU26-18803, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-18803

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Operationalizing Data Fitness-for-Purpose Through Standardized Metrics, Local Uncertainty, and LLM-Extracted Quality Reasoning

Markus Möller, Mahdi Hedayat Mahmoudi, and Paul Peschel

Markus Möller et al.

Julius Kühn Institute, Digitalisation and Artificial Intelligence, Kleinmachnow, Germany (markus.moeller@julius-kuehn.de)

Making geospatial data FAIR requires more than metadata standardization - it demands transparent, structured reporting of data quality and uncertainty that allows researchers to assess fitness-for-purpose across diverse applications. Yet most FAIR implementations still treat quality as a generic metadata field, while uncertainty and fitness‑for‑purpose remain buried in narrative documentation and disciplinary tacit knowledge.

In the FAIRagro consortium, we operationalize an application‑oriented quality framework using the example of Germany‑wide phenology time series (1 km, 1993-2022) by combining three components: (1) standardized producer‑side quality metrics (global R² and RMSE following ISO 19157‑1 for each crop, phase, and year), (2) spatially explicit local uncertainty layers, and (3) a machine‑actionable, application‑specific data quality matrix (AS‑DQM) that captures documented use contexts, validation strategies, limitations, and fitness‑for‑purpose statements from existing publications and workflow descriptions. Large Language Models (LLMs) are central to this workflow: after structure‑preserving conversion of PDFs to enriched Markdown, multimodal LLMs extract quality‑relevant concepts from text, tables, and figures, normalize them against a formal schema, and generate provenance‑linked AS‑DQM JSON profiles that can be queried and reused across applications.

These quality, uncertainty, and fitness profiles are then packaged as FAIR Digital Objects using interoperable containers (ARCs) for version‑controlled, reproducible workflows and RO‑CRATE standards for structured research object metadata - enabling seamless integration with research data management infrastructure and discovery systems. This approach ensures that quality reasoning, local uncertainty estimates, and application contexts travel together with phenology data through the research lifecycle, preserving provenance and enabling automated quality‑aware dataset selection.

This poster represents a transferable template for domain-specific FAIR implementation, demonstrating that structured uncertainty reporting, ISO-compliant quality metrics, LLM-assisted formalization of fitness-for-purpose information, and user-centered fitness-for-purpose assessments are essential bridges between abstract FAIR principles and practical, cross-disciplinary data reuse. For application, users can query not only "where are data FAIR?" but "where are data sufficiently accurate, well‑validated, and uncertainty‑constrained for this specific decision context?". By embedding LLM‑derived quality knowledge, uncertainty products, and an application matrix into machine‑actionable FAIR Digital Objects, we move from static compliance towards dynamic, evidence‑based fitness‑for‑purpose assessment - thereby strengthening trust in public data sets.

How to cite: Möller, M., Hedayat Mahmoudi, M., and Peschel, P.: Operationalizing Data Fitness-for-Purpose Through Standardized Metrics, Local Uncertainty, and LLM-Extracted Quality Reasoning , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18803, https://doi.org/10.5194/egusphere-egu26-18803, 2026.