Super-resolution for satellite imagery: uncovering details using a new Cross Band Transformer architecture

Jasper S. Wijnands; Nikolaos Ntantis; Jan Fokke Meirink; Domenica Dibenedetto

doi:https://doi.org/10.5194/egusphere-egu24-340

[Back] [Session GI2.4]

EGU24-340, updated on 08 Mar 2024

https://doi.org/10.5194/egusphere-egu24-340

EGU General Assembly 2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Super-resolution for satellite imagery: uncovering details using a new Cross Band Transformer architecture

Jasper S. Wijnands¹, Nikolaos Ntantis^1,2, Jan Fokke Meirink¹, and Domenica Dibenedetto²

Jasper S. Wijnands et al.

¹Royal Netherlands Meteorological Institute (KNMI), De Bilt, The Netherlands (jasper.wijnands@knmi.nl)
²Maastricht University, Maastricht, The Netherlands

Recent advances in artificial intelligence (AI) techniques have enabled the processing and analysis of vast datasets, such as archives of satellite observations. In the geosciences, remote sensing has transformed the way in which the atmosphere and surface are observed. Traditionally, substantial funding is directed towards the development of new satellites to improve observation accuracy. Nowadays, novel methods based on AI could become a complementary approach to further enhance the resolution of observations. Therefore, we developed a new, state-of-the-art super-resolution methodology.

Satellites commonly measure electromagnetic radiation, reflected or emitted by the earth's surface and atmosphere, in different parts of the spectrum. Many instruments capture both panchromatic (PAN) and low-resolution multi-spectral (LRMS) images. While PAN typically covers a broad spectral range, LRMS focuses on details in narrow bands within that range. Pansharpening is the task of fusing the spatial details of PAN with the spectral richness of LRMS, to obtain high-resolution multi-spectral (HRMS) images. This has proven to be valuable in many areas of the geosciences, leading to new capabilities such as detecting small-sized marine plastic litter and identifying buried archaeological remains. Although HRMS images are not directly captured by the satellite, they can provide enhanced visual clarity, uncover intricate patterns and allow for more accurate and detailed analyses.

Technically, pansharpening is closely related to the single image super-resolution task, where attention-based models have achieved excellent results. In our study a new Cross Band Transformer (CBT) for pansharpening was developed, incorporating and adapting successful features of vision transformer architectures. Information sharing between the panchromatic and multi-spectral input streams was enabled through two novel components: the Shifted Cross-Band Attention Block and the Overlapping Cross-Band Attention Block, implementing mechanisms for shifted and overlapping cross-attention. Each block led to a more accurate fusion of panchromatic and multi-spectral data. For evaluation, CBT was also compared to seven competitive benchmark methods, including MDCUN, PanFormer and ArbRPN. Our model produced state-of-the-art results on the widely used GaoFen-2 and WorldView-3 pansharpening datasets. Based on peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) scores of the generated images, CBT outperformed all benchmark methods. Our AI method can be integrated in existing remote sensing pipelines, as CBT converts actual observations into a high-resolution equivalent for use in downstream tasks. A PyTorch implementation of CBT is available at https://github.com/VisionVoyagerX/CBT.

Furthermore, we developed the Sev2Mod dataset, available at https://zenodo.org/record/8360458. Unlike conventional benchmark datasets, Sev2Mod acquired input and target pairs from two different satellite instruments: (i) SEVIRI onboard the Meteosat Second Generation (MSG) satellite in geostationary orbit and (ii) MODIS onboard the Terra satellite in polar, sun-synchronous orbit. SEVIRI measures a fixed field of view quasi-continuously, while MODIS passes only twice a day but observes at a much higher spatial resolution. Our study investigated image generation at the spatial resolution of MODIS, while preserving SEVIRI's high temporal resolution. Since Sev2Mod is better aligned with actual situations one may encounter in applications of pansharpening methods (e.g., noise, bias, approximate temporal matching), it provides a solid foundation to design robust pansharpening models for real-world applications.

How to cite: Wijnands, J. S., Ntantis, N., Meirink, J. F., and Dibenedetto, D.: Super-resolution for satellite imagery: uncovering details using a new Cross Band Transformer architecture, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-340, https://doi.org/10.5194/egusphere-egu24-340, 2024.