Synthetic PV Data for Energy Communities&nbsp;

Petrina Papazek; Irene Schicker

doi:https://doi.org/10.5194/egusphere-egu26-19728

[Back] [Session ERE2.1]

EGU26-19728, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-19728

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Synthetic PV Data for Energy Communities

Petrina Papazek and Irene Schicker

GeoSphere Austria, Analysis and Model Development, Wien, Austria (petrina.papazek@geosphere.at)

Accurate and transferable photovoltaic (PV) power forecasting is essential for grid operation and energy system planning, particularly as PV installations continue to expand and energy communities increasingly rely on decentralized, locally managed generation. However, PV production is inherently site-specific, and many community-scale systems lack sufficiently long and continuous observation records to support robust data-driven forecasting approaches.

We present a scalable machine-learning nowcasting framework designed to support PV forecasting for energy communities. The approach integrates (downscaling) spatial radiation nowcasts and combining openly available meteorological data with local information to generate PV power forecasts tailored to individual PV systems or entire communities. It builds on semi-synthetic data generation and post-processing techniques and is specifically designed for data-scarce environments. Local high resolution weather prediction model output such as our in-house post-processing model INCA is used as a primary source of covariates, complemented by available satellite-derived radiation products from CAMS and reanalysis data from ERA5.

The methodology follows a two-fold strategy to address insufficient historical PV data. Where individual PV systems or communities provide a sufficient amount of measured production data for supervised learning, semi-synthetic PV time series are generated using classical approaches based on auxiliary meteorological and radiation data. In this setting, Random Forest models are employed due to their robustness for limited, seasonal datasets and their ability to capture nonlinear feature interactions without excessive overfitting. In cases where observational data are extremely scarce, an alternative strategy is applied using pre-trained foundation models. These models are driven by a set of meteorological and temporal covariates and calibrated using forecast radiation fields converted into site-specific PV power via PVLib and detailed PV meta-data (e.g. system geometry, technical parameters, and location). In both cases, semi-synthetic PV time series are effectively used to augment training data and optimize data driven nowcasting.

Model performance is evaluated across a diverse set of PV sites and compared against persistence and climatological baselines. Results indicate that semi-synthetic data combined with local covariates provide a robust approach for transferable PV power nowcasting and is useful for energy community use cases.

How to cite: Papazek, P. and Schicker, I.: Synthetic PV Data for Energy Communities , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19728, https://doi.org/10.5194/egusphere-egu26-19728, 2026.