EGU24-11077, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-11077
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Synthetic Data and AI - Teaching a Neural Network to Identify Clouds Despite the Lack of Annotated Observation Data

Ronald Scheirer1, Aleksis Pirinen2, Nosheen Abid3, Nuria Agues Paszkowsky2, Thomas Ohlson Timoudas2, Chiara Ceccobello4, György Kovács3, and Anders Persson5
Ronald Scheirer et al.
  • 1Swedish Meteorological and Hydrological Institute
  • 2RISE Research Institutes of Sweden
  • 3Luleå University of Technology
  • 4AI Sweden
  • 5The Swedish Forest Agency

Clouds are characterized - among other things - by their intense variability in time, space and optical thickness. These variables impact the modulation of solar radiation (reflection, transmission and absorption) and may distort the signal from the surface beneath. This in turn makes it important to detect even optically thin clouds using remote sensing methods, even if the focus is on earth observation.

This study has been initiated by the Swedish Forest Agency (SFA). In order to reduce the proliferation of bark beetles, SFA needs to identify stressed trees at an early stage. To this end, high-resolution scenes from the Multi-Spectral Imager (MSI) on board the Sentinel-2 platforms were analyzed. Unfortunately, the quality of ESA's scene classification layer (SCL) does not meet the requirements for reliably sorting out scenes contaminated with thin clouds.

To overcome this problem, it was decided to make use of the fact that the integration of machine learning (ML) methods within the remote sensing domain has significantly improved performance on remote sensing tasks. But a common difficulty is that ML methods typically depend on large amounts of annotated data for training. Annotation or classification is usually done manually or by a superior instrument (i.e. active LIDAR). Since such a data basis is missing, a synthetic database (based on simulations instead of observations) was generated to train a Multi Layer Perceptron (MLP). The dataset consists of 200,000 data points, which have been simulated taking into consideration different cloud types, cloud optical thicknesses (COT), cloud geometrical thickness, cloud heights, as well as ground surface and atmospheric profiles. The MLP is trained to predict COT as a proxy for the cloud/clear decision.

The performance of the proposed algorithm using both synthetic data (as used during training) and real satellite observations (never presented to the algorithm before) will be discussed in detail. It was found that the MLP approach trained on 1D synthetic data can seamlessly transition to real datasets without requiring additional training. Furthermore it outperforms the ESA-SCL.

How to cite: Scheirer, R., Pirinen, A., Abid, N., Agues Paszkowsky, N., Ohlson Timoudas, T., Ceccobello, C., Kovács, G., and Persson, A.: Synthetic Data and AI - Teaching a Neural Network to Identify Clouds Despite the Lack of Annotated Observation Data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11077, https://doi.org/10.5194/egusphere-egu24-11077, 2024.