EPSC Abstracts
Vol. 18, EPSC-DPS2025-58, 2025, updated on 09 Jul 2025
https://doi.org/10.5194/epsc-dps2025-58
EPSC-DPS Joint Meeting 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Vision Transformers for identifying asteroids interacting with secular resonances.
Valerio Carruba1,2, Safwan Aljbaae3, Evgeny Smirnov4, and Gabriel Caritá3
Valerio Carruba et al.
  • 1UNESP, School of Natural Sciences and Engineering, Mathematics, Guaratinguetá, Brazil (valerio.carruba@unesp.br)
  • 2Laboratório Interinstitucional de e-Astronomia, RJ 20765-000, Brazil.
  • 3National Space Research Institute (INPE), Postgraduate Division, São José dos Campos, SP 12227-310, Brazil.
  • 4Belgrade Astronomical Observatory, Volgina 7, Belgrade, Volgina 7, Serbia

Asteroid families are groups of asteroids formed by a collision or fission event.  Some asteroid families interact with secular resonances. Because of planetary perturbations, the pericenter and nodes of planets and asteroids precess with frequencies g and s. When the precession frequency of the asteroid (g) is close to that of Saturn (g6), or g-g6≈0, the ν6 resonance occurs.

Figure (1): The location of main secular resonances in the (a, sin(i)) domain.

Contrary to mean-motion resonances, secular resonances cannot be easily identified in a proper elements' 2-D domains. To identify if an asteroid is in a secular resonance, we need to investigate the time behavior of its resonant argument. With over 8 million asteroids predicted to be discovered by the Vera C. Rubin Observatory, traditional visual analysis of arguments will no longer be feasible.

Figure (2): Examples of resonant arguments for asteroids circulating, alternating phases of circulation and libration, and in libration states.

The first deep learning approach for identifying asteroids interacting with secular resonance was introduced in Carruba et al. (2021), with a multi-layer perceptron model. This is a five-step process:

1. We integrate the asteroid orbits under the gravitational influences of all planets.

2. We compute the time series of the resonant argument.

3. Images of these time series are obtained for each asteroid.

4. The model is trained on a set of labeled image data.

5. The model predicts the labels for a set of test images.

Carruba et al. (2022) applied Convolutional Neural Networks (CNN) for the classification of large databases of images and regularization techniques for correcting overfitting. In Carruba et al. (2024), digitally filtered images of resonant arguments were used to enhance the performance of CNNs. Finally, in Carruba et al. (2025), there was the first application of Vision Transformers.

Convolutional neural networks, or CNNs, are a neural network model originally designed to work with two-dimensional picture data. Their name derives from the convolutional layer. Convolution is a linear procedure involving the multiplication of a two-dimensional array of weights with an input array. The result of applying the filter is a two-dimensional array: the feature map.  Three of the most commonly used CNN models are the VGG (Simonyan & Zisserman 2014), Inception (Szegedy et al. 2015), and the Residual Network, or ResNet (He et al. 2015). 

Figure (3): An example of the application of Vision Transformers to the analysis of an image.

The Vision Transformer architecture for classifying images of resonant arguments was first applied in Carruba et al. (2025). The ViT model is based on the Transformer architecture (Vaswani et al. 2017), and it applies the Transformer architecture directly to image data, without the need for CNNs. In the ViT approach, an input image is split into fixed-size patches, usually 1/10 of the image size, which are then linearly embedded and fed into the Transformer encoder. The Transformer encoder consists of a series of Transformer blocks, which are made of two parts:

a. Self-Attention Mechanism: This allows the model to weigh the importance of different images for ViT, in a sequence relative to each other, enabling it to capture contextual relationships regardless of their distance in the input sequence.

b. Feed-Forward Neural Network: After the self-attention step, the output is passed through a feed-forward network, which applies transformations to the data independently for each position in the sequence.

Multiple transformer blocks can be stacked to form a complete Transformer model, allowing it to capture long-range dependencies and global information within the image.  Two key hyperparameters in our model are:

1. num_layers: The number of Transformer blocks.

2. num_heads: The number of attention heads in the Multi-Head Attention layer.

We applied CNNs and ViT to three publicly available databases of images of resonant arguments for the ν6 (Carruba et al. 2022), g − 2g6 + g5 (Carruba et al. 2024a), and s − s6 − g5 + g6 (Carruba et al. 2024b).

Figure (4): Semi-logarithmic plot of the computational time for applying different methods of image classification.

The models' performance was superior when applied to images of filtered resonant arguments. ViT models outperformed CNNs in terms of running times (10 times faster!) and evaluation metrics, and their results are comparable to those of models produced by the new LLM approach of Smirnov (2024).

References

Carruba V., Aljbaae S., Domingos R. C., Barletta W., 2021, Artificial Neural Network classification of asteroids in the M1:2 mean-motion resonance with Mars, MNRAS, 504, 692.

V. Carruba, S. Aljbaae , G. Carita, R. C. Domingos, B. Martins, 2022, Optimization of Artificial Neural Networks models applied to the identification of images of asteroids' resonant arguments, CMDA, 134, A59.

V. Carruba, S. Aljbaae, R. C. Domingos, G. Carita, A. Alves, E. M. D. S. Delfino, Digitally filtered resonant arguments for deep learning classification of asteroids in secular resonances, 2024, MNRAS, 531, 4432-4443.

V. Carruba, S. Aljbaae, E. Smirnov, G. Carita, 2025, Vision Transformers for identifying asteroids interacting with secular resonances, Icarus, 425C 116346.

E. Smirnov, 2024, Fast, Simple, and Accurate Time Series Analysis with Large Language Models: An Example of Mean-motion Resonances Identication. ApJ , 966(2), 220.to asteroid resonant dynamics.

K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv e-prints , page arXiv:1409.1556, September 2014.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computervision and pattern recognition , pages 1#9, 2015.

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition, 2015.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.

How to cite: Carruba, V., Aljbaae, S., Smirnov, E., and Caritá, G.: Vision Transformers for identifying asteroids interacting with secular resonances., EPSC-DPS Joint Meeting 2025, Helsinki, Finland, 7–12 Sep 2025, EPSC-DPS2025-58, https://doi.org/10.5194/epsc-dps2025-58, 2025.