EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Towards generation of synthetic hyperspectral image datasets with GAN

François De Vieilleville1, Adrien Lagrange1, Nicolas Dublé1, and Bertrand Le Saux2
François De Vieilleville et al.
  • 1AGENIUM Space, TOULOUSE, France

In the context of the project CORTEX, a study was carried out to build a method to generate synthetic images with associated labels for hyperspectral use cases. Such a method is interesting in the case where too few annotated data are available to train a deep neural network (DNN). The context of hyperspectral images is particularly suited for this problem since labeled datasets of hyperspectral images are scarce and generally of very small size.

Therefore, the first step of the project was to define an interesting hyperspectral use case to carry out the study. More concretely, generative models must be trained to achieve this objective. It means that a set of hyperspectral images and their associated ground truth are necessary to train the models. A dataset was created with PRISMA images associated with the IGN BD forest v2. The result is a segmentation dataset of 1268 images of size 256x256 pixels with 234 spectral bands. The associated ground truth includes 4 classes: not-forest, broad-leaved forest, coniferous forest and mixed forest. To correctly match the ground truth and the images, an important work was done about the improvement of the geolocalization of the PRISMA images by coregistering patches with Sentinel-2 images. We want to underline the interest of this database that remains from our knowledge one the few large scale HS database and is made available on the platform Zenodo.

Then, a segmentation model was trained with the dataset to assess its quality and the feasibility of the task of forest-type segmentation. Good results were obtained using a Unet-EfficientNet segmentation DNN. It showed that the dataset is coherent but the problem still difficult since the ‘mixed forest’ class remains challenging to identify.

Finally, an important research work was conducted to develop a Generative Adversarial Network method able to generate synthetic hyperspectral images. The state-of-the-art StyleGAN2 was modified to this purpose. An additional discriminator was added and tasked with the job of discriminating synthetic and real images in a reduced image space. Good results were obtained for the generation of 32-bands images, but the results worsen when increasing more the number of bands. The difficulty of the problem appears directly linked to the number of bands that we look to generate.

The final goal was to generate synthetic ground truth masks alongside the images and the method SemanticGAN was elected to address this problem. Since this method is based on StyleGAN2, the improvements of StyleGAN2 for HS images were included in the method. At the end, a modified version of SemanticGAN was proposed. The discriminator assessing the coherence between masks and images was modified to use an image of reduced dimension and a specific training strategy was introduced to help the convergence. The initial expectation was that the generation of masks would help stabilizing the generation of images, but the experiments showed the contrary. Early results are promising, but more research will be necessary to obtain couples of images and masks that could be used to train a DNN.

How to cite: De Vieilleville, F., Lagrange, A., Dublé, N., and Le Saux, B.: Towards generation of synthetic hyperspectral image datasets with GAN, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-7299,, 2023.