- 1Tianjin Key Laboratory for Water Environment and Resources, Tianjin Normal University, Tianjin, China (haoyh@sxu.edu.cn)
- 2Department of Mechanical and Electrical Engineering, Shanxi institute of Energy, Taiyuan, China (anlx@sxie.edu.cn)
Spring discharge modelling is often constrained by limited data availability. To address this challenge, we propose a hybrid framework that combines TimeGAN-based data augmentation with LSTM and GRU models for spring discharge forecasting and apply it to the Niangziguan Spring in northern China. First, TimeGAN is trained on the limited historical record to learn the underlying statistical properties and temporal dynamics and is then used to generate high-quality synthetic sequences. To evaluate the usefulness of the generated data, we generate synthetic sequences of the same length as the original training set and train LSTM and GRU models separately using (i) the observed data and (ii) the synthetic data and then compare their performance on the test set. Models trained on observed versus synthetic data show comparable test performance, indicating that the synthetic sequences reproduce the temporal dynamics and statistical properties that are critical for the prediction task and are functionally equivalent to the observed data for model training.
Next, TimeGAN is used to expand the training set to between one and six times its original size. t-distributed stochastic neighbour embedding (t-SNE) is used to visualise the distributional consistency between observed and synthetic samples. Qualitative assessment shows that similarity in local structure and distribution patterns increases as the amount of generated data increases: synthetic data quality improves markedly when the synthetic dataset reaches three to four times the size of the original dataset, whereas further increases (four times or more) yield no evident additional improvement. Overall, the synthetic data increase sample diversity while remaining consistent with the original time-series distribution, thereby strengthening model learning when incorporated into the training set.
To quantitatively assess the effect of augmentation, we compare the hybrid models with the baseline LSTM and GRU models using training sets with observed-to-synthetic data ratios ranging from 1:1 to 1:4. Results show that both hybrid models consistently outperform their respective baselines across all evaluation metrics (MAE, MAPE, RMSE, and NSE) during training, validation, and testing, demonstrating the effectiveness of TimeGAN-based data augmentation. Notably, performance does not improve linearly with increasing volumes of synthetic data; an optimal observed-to-synthetic ratio of 1:3 is identified. At this ratio, the test NSE reaches 0.91 for the TimeGAN–LSTM model and 0.94 for the TimeGAN–GRU model. Increasing the ratio to 1:4 results in a slight performance decline (e.g. the test NSE decreases from 0.91 to 0.90 for TimeGAN–LSTM and from 0.94 to 0.93 for TimeGAN–GRU), which is likely attributable to minor distributional deviations introduced by excessive synthetic data. These findings highlight the need to determine an appropriate augmentation ratio in generative data augmentation.
Across all metrics, and particularly at the optimal ratio, TimeGAN–GRU outperforms TimeGAN–LSTM. This advantage is attributed to the GRU’s streamlined architecture, fewer parameters, and stronger adaptability to the “denoised” synthetic sequences generated by TimeGAN, thereby improving prediction accuracy and robustness under data-scarce conditions. Overall, this study demonstrates the effectiveness of TimeGAN in alleviating hydrological data scarcity and provides a practical and quantifiable approach for hydrological time-series prediction in small-sample settings.
How to cite: Hao, Y. and An, L.: A TimeGAN-Augmented LSTM/GRU Framework for Spring Discharge Forecasting Under Limited Data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3628, https://doi.org/10.5194/egusphere-egu26-3628, 2026.