EGU26-8402, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-8402
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
PICO | Wednesday, 06 May, 16:26–16:28 (CEST)
 
PICO spot 2, PICO2.4
Buildings as Text: A Universal Regression Paradigm for Building Attribute Prediction
Chih-Chi Wang1 and Peng Luo2
Chih-Chi Wang and Peng Luo
  • 1Department Aerospace and Geodesy, Technical University of Munich, Germany (chi964.wang@tum.de)
  • 2Department of Urban Studies and Planning, Massachusetts Institute of Technology, USA (pengluo@mit.edu)

Predicting building attributes—such as functional classification, socioeconomic status, and energy efficiency—is a fundamental task in urban science. The current paradigm involves leveraging domain knowledge to extract attribute-specific morphological or topological features for supervised modeling. However, this heavy reliance on manual feature engineering often leads to task-specific models where features must be redefined for each attribute. Consequently, the field lacks a unified, generalizable framework capable of multi-attribute building prediction.

Inspired by recent advances in Regression Language Models (RLMs), which cast continuous prediction as a text-to-text task, we propose Buildings as Text (BaT). BaT serializes structured building representations (e.g., GeoJSON) into raw text and enables end-to-end text-to-text regression. To mitigate the spatial sensitivity of building data, we introduce a Topology-Preserved Coordinate (TPC) strategy that removes each building text’s absolute positional information. Specifically, TPC applies a global coordinate shift to the serialized geometry, suppressing absolute-location bias while preserving local shape and topology. By operating directly on raw text, BaT eliminates manual feature engineering and allows the model to learn a “spatial syntax” from the underlying geometric descriptions.

We validated the BaT framework through a case study on informal settlement (slum) classification. The results demonstrate that our model achieves superior performance and higher adaptability compared to traditional morphology-based methods. While validated on slum detection, this research offers a universal and scalable paradigm for urban building analysis, suggesting that Large Language Models can effectively "read" urban forms for diverse prediction tasks beyond specific domains.

How to cite: Wang, C.-C. and Luo, P.: Buildings as Text: A Universal Regression Paradigm for Building Attribute Prediction, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8402, https://doi.org/10.5194/egusphere-egu26-8402, 2026.