Disentanglement of Structure and Texture Representations as a Method of Self-Supervision for Earth Observation Data: A Case Study on Cloud Type

Mikolaj Czerkawski; Alistair Francis; Paul Borne--Pons; Barbara Bertozzi; Jacqueline Campbell

doi:https://doi.org/10.5194/egusphere-egu26-7605

[Back] [Session ESSI1.4]

EGU26-7605, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-7605

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Disentanglement of Structure and Texture Representations as a Method of Self-Supervision for Earth Observation Data: A Case Study on Cloud Type

Mikolaj Czerkawski, Alistair Francis, Paul Borne--Pons, Barbara Bertozzi, and Jacqueline Campbell

Mikolaj Czerkawski et al.

Asterisk Labs, London, United Kingdom

Self-supervised learning has become a prominent technique for representation learning in Earth observation, largely due to the vast volumes of unlabelled data available in observation archives. However, apart from masked auto-encoding (MAE) techniques and contrastive learning, the diversity of geospatial self-supervised learning schemes in the existing literature remains limited.

This work explores the task of structure and texture disentanglement as an alternative route to self-supervised learning in the domain of Earth observation. Inspired by the Swapping Autoencoder architecture, this pipeline involves an encoder tailored to extract disentangled textural and structural information from an image and reconstruct it back to the image domain. Crucially, it includes an augmentation step that swaps texture and structure embeddings from different samples. This synthetic generation is driven by adversarial training, employing two discriminators: one responsible for assessing the likelihood of the image as a whole being real, and the other for assessing whether individual patches in the image are consistent with the source texture vector.

The texture embedding extracted from the image acts as a global vector describing the aggregated statistics of local features, while the structure embedding represents how these features are distributed in space. This preliminary work explores the potential of this approach on a domain where image labels are particularly scarce: cloud formation types in high-resolution optical imagery. The pipeline is tested on a large collection of cloudy Sentinel-2 images with the goal of identifying observational clusters of cloud formations that share similar properties, as part of the Clouds Decoded project. This work introduces a foundational architecture for this framework along with several methods of analysis that leverage the resulting deep neural network.

How to cite: Czerkawski, M., Francis, A., Borne--Pons, P., Bertozzi, B., and Campbell, J.: Disentanglement of Structure and Texture Representations as a Method of Self-Supervision for Earth Observation Data: A Case Study on Cloud Type, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7605, https://doi.org/10.5194/egusphere-egu26-7605, 2026.