- 1Queen Mary University of London, London, UK (enrico.camporeale@qmul.ac.uk)
- 2University of Colorado, SWx-TREC, Boulder, United States of America
Accurate prediction of rare but high-impact events is a recurring challenge in planetary science and heliophysics, where strongly imbalanced data distributions are common (e.g. extreme space-weather conditions). Standard empirical risk minimization tends to bias machine-learning models toward frequently observed regimes, often leading to poor performance on scientifically and operationally critical tail events. Existing mitigation strategies, such as loss re-weighting or synthetic over-sampling, have shown mixed and problem-dependent success.
We present PARIS (Pruning Algorithm via the Representer theorem for Imbalanced Scenarios), a data-centric framework that addresses imbalance by optimizing the training dataset itself rather than modifying the loss function or model architecture. PARIS exploits the representer theorem for neural networks to compute a closed-form representer deletion residual, which quantifies the change in validation loss induced by removing an individual training sample—without requiring retraining. Using an efficient Cholesky rank-one downdating scheme, this enables fast, iterative pruning of uninformative or performance-degrading samples.
We demonstrate PARIS on a real-world space-weather regression problem (Dst prediction), where it reduces the training set by up to 75% while preserving or improving overall RMSE and outperforming loss re-weighting, synthetic over-sampling, and boosting baselines. These results highlight representer-guided dataset pruning as a computationally efficient, interpretable, and physically relevant approach for improving rare-event regression in heliophysics and related planetary science applications.
Preprint: https://www.arxiv.org/abs/2512.06950
How to cite: Camporeale, E.: PARIS: Pruning Algorithm via the Representer theorem for Imbalanced Scenarios, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6702, https://doi.org/10.5194/egusphere-egu26-6702, 2026.