- 1Universitat de València, Image and Signal Processing, Electronical Engineering, Bujassot, Spain (oscar.pellicer@uv.es)
- 2Asterisk Labs, London, England (miko@asterisk.coop)
The field of Artificial Intelligence for Earth Observation (AI4EO) currently suffers from significant data friction, especially when moving Petabyte-scale archives from cloud object storage to High Performance Computing (HPC) nodes. We present the TACO (Transparent Access to Cloud-Optimized datasets), a production-grade standard designed to replace file-centric legacy workflows with a high-throughput streaming paradigm.
We showcase the practical implications of this architecture for the deployment of geospatial Foundation Models (FMs), by running pretrained FMs on downstream inference tasks (such as semantic segmentation or land-cover classification) directly on arbitrary samples of arbitrary cloud-hosted datasets, quickly, and without the need for local staging or any specific preprocessing. TACO bridges the gap between static cloud archives and dynamic HPC processing, allowing seamless, scalable AI4EO workflows, and fulfilling the so far unfulfilled promise of FMs of "train once, apply everywhere".
References:
- Cesar Aybar, et al. (2025). The Missing Piece: Standardising for AI-ready Earth Observation Datasets. Poster at TerraBytes-ICML 2025 Workshop. Vancouver, Canada
- TACO Foundation. (2025, November 21). The TACO specification (Version 2.0.0). https://tacofoundation.github.io/specification
How to cite: Pellicer-Valero, O. J., Aybar, C., Czerkawski, M., Oliver, C., Monsálvez, K., Contreras, J., and Camps-Valls, G.: TACO: Operationalizing AI-Ready EO datasets, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8000, https://doi.org/10.5194/egusphere-egu26-8000, 2026.