Promises and limitations of machine-learning-based methods for satellite retrieval of solar surface irradiance
- Mines Paris, Université PSL, Centre Observation Impacts Energie (O.I.E.)
Accurate estimations of Surface Solar Irradiance (SSI) are of high interest in domains as varied as climatology, solar energy, architecture, and agriculture. SSI estimations derived from meteorological satellites enable continuous spatial and temporal coverage and have thus become an important source of information standardly used for the planning, operation, and forecast of the production of PV power systems. To infer SSI from satellite images is, however, not straightforward; since the eighties, multiple satellite-based retrieval approaches have been proposed, from the earlier cloud index methods to physically based ones. Recent approaches are emerging based on machine learning (ML), inferring a direct data-driven model between images acquired from satellite and SSI ground measurements. Although only a few such works have been published, their practical efficiency has already been questioned. The objective of this paper is not to propose a new ML-based method but to better understand the promises and limitations of this new coming family of methods.
To do so, we implement simple multi-layer-perceptron models with different training datasets of satellite-based radiance measurements from Meteosat Second Generation (MSG) with collocated SSI ground measurements. To test the model's ability to generalize in time and space, we use different locations and time periods for training and testing. How we allocate measurement stations to each group is also a crucial factor. To understand our model behavior, we study two setups. In the first setup, stations are randomly assigned to each group, resulting in distinct but spatially interlaced training and test stations. In the second setup, we enforced strict and large geographical separation, allowing us to evaluate the model's performance in locations outside its training area. In both cases, the performance of the ML-based retrieval model is compared to that of the operational CAMS radiation service (CRS), which is based on Heliosat-4, a state-of-the-art physical retrieval model.
Our results show that the data-driven model’s performance can be much better than CRS but is very dependent on the training set, raising problems of generalization. Indeed, in the first setup, the ML model has a Root Mean Square Error (RMSE) almost 20% lower than CRS, but in the second training setup – when training and test stations are geographically separated, CRS RMSE is only 4% higher. Perhaps more critically, in the first setup, the ML model RMSE is lower or comparable to that of CAMS for all test stations but in the setup enforcing geographical separation, the ML model underperforms dramatically for several test locations.
ML models have great potential for satellite retrieval, but their inability to generalize in certain configurations could be critical and hinder their deployment to regions with sparse measurement networks. A hybrid approach combining data-driven and physical models seems to be of interest for further research activities.
How to cite: Verbois, H., Becquet, V., Saint-Drenan, Y.-M., Gschwind, B., and Blanc, P.: Promises and limitations of machine-learning-based methods for satellite retrieval of solar surface irradiance, EMS Annual Meeting 2023, Bratislava, Slovakia, 4–8 Sep 2023, EMS2023-460, https://doi.org/10.5194/ems2023-460, 2023.