The Yellow River Basin (YRB) serves as a critical repository of literature for understanding human-earth systems, yet existing automated metadata-level review methods suffer from deep semantic loss and deficiencies in spatial representation: They neither capture fine-grained logic chains from full texts nor possess the capability to extract the spatial and hierarchical attributes of geographic entities. However, rapid developments in Large Language Models (LLMs) provide a technological opportunity for the automated extraction of full-text knowledge. To this end, this study proposes the Geo-Knowledge Infused Reasoning Framework (GK-IRF), coupling full-text semantics with multi-level spatial indexing. Methodologically, we first construct an ontology-based full-text parsing mechanism based on 8,493 YRB-related papers (2015-2024), utilizing LLMs to accurately extract structured semantic triplets. Simultaneously, we introduce an adaptive multi-level GeoHash indexing model to map textual toponyms into hierarchically nested grid sets, reconstructing the spatial coverage and multi-scale associations of geographic entities. Validations against a manually annotated dataset indicate that GK-IRF achieves an F1-score comparable to human performance in full-granularity semantic extraction; furthermore, the Spatial Coverage Accuracy of the multi-level grids for the YRB substantially outperforms traditional geocoding methods, effectively resolving the challenge of multi-scale coverage representation.
How to cite: Wu, S. and Wang, H.: Coupling Full-Text Semantics with Multi-Level Spatial Indexing: A Knowledge Representation Framework for Yellow River Basin Literature, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6201, https://doi.org/10.5194/egusphere-egu26-6201, 2026.