- 1Faculty of Geoinformation Science and Earth Observation, Univeristy of Twente, Enschede, Netherlands (f.campomanes@utwente.nl)
- 2Faculty of Behavioural, Management and Social Sciences, University of Twente, Enschede, Netherlands
The integration of Earth Observation (EO) data with machine learning (ML) has transformed the mapping of Deprived Urban Areas (DUA). Despite these technical advances, persistent disconnect remains between research outputs and their operational uptake by local stakeholders. In parallel, advances in ML and deep learning (DL), together with new satellite missions have improved the extraction of building footprints and urban morphology. Nevertheless, DUA mapping studies, which largely depend on these physical indicators, often prioritize benchmark performance over the robustness, transparency, or usability required in real-world decision-making contexts. One of the main reasons for this gap is spatial data quality (SDQ), which fundamentally limits model performance and generalization. When data quality is poor, due to inaccuracies, incompleteness, or inadequate provenance, models become unreliable, regardless of architectural complexity. Furthermore, many studies rely on validation strategies that ignore spatial autocorrelation, thereby yielding overoptimistic accuracy estimates that mask poor generalization to new local contexts.
To address these challenges, this paper argues for a shift toward a systematic assessment of spatial data quality. We first conduct a scoping review of 50 state-of-the-art DUA mapping studies published between 2017 and 2025. Our analysis reveals a high dependence on very-high-resolution imagery (72%), a widespread lack of publicly accessible data and code (92%), and a critical deficiency in operationalizing semantic definitions of DUAs with 90% of studies failing to provide mapping rules (for visual interpretation) or ground rules (for in-situ collection). Most studies also fail to assess user needs (90%) or do not consider the ethical implications of using DUA data (88%), which is highly sensitive due to risks such as forced evictions. Building on these findings and established international standards from ISO and the OGC, we propose a comprehensive Spatial Data Quality (SDQ) framework tailored to transparently document supervised image classification in DUA mapping. This framework integrates established practices such as adherence to the Findable, Accessible, Interoperable, Reusable (FAIR) principles and assessment of acquisition, measurement and spatial-temporal quality with novel dimensions addressing semantic consistency, sampling representativeness, human factors in annotation, learning shortcut risk, user needs validity, ethical considerations, and transparent reporting of the dataset’s potential failure modes or uncertainties. By operationalizing SDQ as a living, extensible framework, this work aims to better align advances in ML and DL with sustained societal impact, ensuring that DUA mapping products, or any relevant application domain, are fit for use by local communities and decision-makers.
How to cite: Campomanes V, F., Kuffer, M., Stein, A., Dijkstra, A. M., Trento Oliveira, L., and Belgiu, M.: A framework for assessing the quality of spatial data applied in supervised image classification of deprived urban areas, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19025, https://doi.org/10.5194/egusphere-egu26-19025, 2026.