Multi-frame cloud prediction from all-sky images: RGB vs segmented masks

Javier Gatón; Roberto Román; Cesar Guzman; Daniel González-Fernández; Bruno Longarela; Celia Herrero del Barrio; Sara Herrero-Anta; Ramiro González; Carlos Toledano

doi:https://doi.org/10.5194/egusphere-egu26-9705

[Back] [Session AS3.12]

EGU26-9705, updated on 27 Apr 2026

https://doi.org/10.5194/egusphere-egu26-9705

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Multi-frame cloud prediction from all-sky images: RGB vs segmented masks

Javier Gatón^1,2, Roberto Román^1,2, Cesar Guzman³, Daniel González-Fernández^1,2, Bruno Longarela^1,2, Celia Herrero del Barrio^1,2, Sara Herrero-Anta^1,2, Ramiro González^1,2, and Carlos Toledano^1,2

Javier Gatón et al.

¹Group of Atmospheric Optics (GOA-UVa), Universidad de Valladolid, Valladolid, 47011, Spain
²Laboratory of Disruptive Interdisciplinary Science (LADIS), Universidad de Valladolid, Valladolid, 47011, Spain
³DriMT AI Space, Valga, 68204, Estonia

Short-term forecasting of cloud position is essential for improving solar irradiance nowcasting, the management of photovoltaic systems, and atmospheric monitoring. In this work, we evaluate the impact of replacing RGB all-sky images with semantically segmented sky masks as an input representation for multi-frame cloud motion prediction, assuming the availability of a sky segmentation model. To this end, we have adapted a ConvLSTM (Shi et al., 2015) backbone to operate on five-class segmentation masks (cloud-free, cloud, thin cloud, sun, other), enabling a controlled comparison with an RGB-based ConvLSTM. The training and evaluation are performed using the SKIPP’D dataset (Nie et al.,2023): around 58,000 videos with 1-min resolution. To ensure consistent evaluation, all predictions and ground-truth frames are processed through a common segmentation model. Thus, model performance is evaluated in the segmentation label space, using segmenter-derived masks as a proxy reference rather than physical ground truth.

Operating on the semantic mask space improves temporal stability and agreement with reference masks across standard segmentation metrics. On average, it increases the Intersection over Union by 0.49%, and the Dice coefficient by 0.94%, relatively to the RGB baseline. Improvements are most notable for the dominant classes cloud and cloud-free, while performance on thin-cloud and sun pixels remains limited, due to their lower frequency, intrinsic semantic ambiguity, and the reduced spatial resolution of the dataset. The results also show a trade-off between recall reduction and precision improvement.

These results indicate that introducing semantic information as an intermediate representation simplifies the prediction task and strengthens the model’s ability to capture cloud evolution patterns within a segmentation-based evaluation framework. While the present study does not provide end-to-end validation against irradiance measurements, it highlights the potential of segmentation-based approaches for future cloud nowcasting systems and motivates further work at higher spatial resolutions, with direct radiative validation, and with different network architectures.

This work was supported by the Ministerio de Ciencia e Innovación (MICINN), with the grant no. PID2024-157697OB-I00 and TED2021-131211BI00375. Financial support of the Department of Education, Junta de Castilla y León, and FEDER Funds is acknowledged (CLU-2023-1-05). This work was funded by European Comision through the EUBURNRISK project (INTERREG-SUDOE; S2/2.4/F0327). The authors acknowledge the support of COST Action CA21119 HARMONIA and the Spanish Ministry for Science and Innovation to ACTRIS ERIC

Shi, Z. Chen, H. Wang, D.-Y. Yeung, W. kin Wong, W. chun Woo, Convolutional LSTM Network: A machine learning approach for precipitation nowcasting (2015). arXiv: 1506.04214

Nie, X. Li, A. Scott, Y. Sun, V. Venugopal, A. Brandt, Skipp’d: A sky images and photovoltaic power generation dataset for short-term solar forecasting, Solar Energy 255 (2023) 171–179.

How to cite: Gatón, J., Román, R., Guzman, C., González-Fernández, D., Longarela, B., Herrero del Barrio, C., Herrero-Anta, S., González, R., and Toledano, C.: Multi-frame cloud prediction from all-sky images: RGB vs segmented masks, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9705, https://doi.org/10.5194/egusphere-egu26-9705, 2026.

OSPP voting tool

This contribution takes part in the OSPP contest. Please log in to see the relevant judging section.

Supplementary materials

Supplementary material file

Comments on the supplementary material

AC: Author Comment | CC: Community Comment | Report abuse

supplementary materials version 1 – uploaded on 29 Apr 2026, no comments