EGU24-8933, updated on 08 Mar 2024
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

An Austrian case study on empowering ReduceData solar power forecasting using a ML-driven semi-synthetic data generator

Petrina Papazek, Pascal Gfäller, and Irene Schicker
Petrina Papazek et al.
  • Geosphere Austria, Analysis and Model Development, Wien, Austria (

Heterogenous, location dependent solar power/PV installations entail individually different production. This is a challenge for power grid operators as to feed-in PV-production, besides its vast output variability, the grid operators need very high-resolution (temporal and spatial) power forecasts, ideally tailored to each of these sites. Technological advances along with the expansion of solar energy will often modify the initial setup of a production site, thereby significantly altering the production data over their record time. Inevitably inconsistent presentations of historic data or short record periods (e.g.: in case of newly build sites) pose challenges in the renewable sector. This induces a common issue in AI driven post-processing:  machine learning and AI powered forecasts heavily rely on sufficient, consistent historic data, more so if simulating expected production peaks in high temporal resolution is part of the requirements. To address the need of such reduced historic data, we aim at generating semi-synthetic data within the ReduceData project by providing a sufficiently represented and continuous data set across multiple data sources. Building on random forest models, we exploit spatial and temporal strongly associated non-reduced auxiliary data, such as satellite data products (e.g.: CAMS) and reanalysis fields (e.g.: ERA5).  Due to their limited nature, PV production records and high-resolution numerical models (e.g.: AROME) will be targeted by our semi-synthetic data generator. The presented case study focuses on nowcasting- to short-range forecasts in 15-minute update frequency tailored to selected solar power production sites in East-Austria. We study to what extent deep learning methods benefit from a consistent semi-synthetic data set built on different raw data sources, highlighting the added value of combining various sources via deep learning. Inputs for the AI-driven post-processing are, for instance, the climatology of satellite data and reanalysis, pvlib’s estimations, AROME surface parameters, and in-house nowcasting models (e.g.: IrradPhyD-Net). Different settings of the semi-synthetic data generator are evaluated by cross-validation. In most studied cases, we achieve a high skill compared to available classical and standard methods (e.g.: persistence, climatology). 

How to cite: Papazek, P., Gfäller, P., and Schicker, I.: An Austrian case study on empowering ReduceData solar power forecasting using a ML-driven semi-synthetic data generator, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8933,, 2024.