OOS2025-1575, updated on 26 Mar 2025
https://doi.org/10.5194/oos2025-1575
One Ocean Science Congress 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
A little bird told me : transfer learning for automatic detection and classification of blue and fin whales low-frequency vocalizations in the Southern Ocean
Lucie Jean-Labadye1,2,3, Gabriel Dubus1,2, Dorian Cazau2, Nicolas Farrugia3, and Olivier Adam1
Lucie Jean-Labadye et al.
  • 1Sorbonne University, CNRS, Institut d’Alembert UMR 7190, LAM, Paris, France (lucie.jean-labadye@dalembert.upmc.fr)
  • 2ENSTA Bretagne Lab-STICC - UMR CNRS 6285, Brest, France
  • 3IMT Atlantique, CNRS, Lab-STICC, Brest, France

With the increasing availability and efficiency of sound recording devices, bioacoustics is becoming a leading tool to monitor loud populations and/or behaviors, owing to its long-term, non-invasive nature. This particularly applies to marine mammals that are among one of the most vocal contributors of underwater soundscapes. Passive acoustic monitoring (PAM) facilitates such studies, yet the process of manually annotating data remains time-consuming and labor-intensive for any human. The use of deep neural networks (DNN) is becoming increasingly common on the field, but it still remains challenging. Indeed, numerous complexities arise, from data acquisition phase (wide vocal repertoire even from one sound source, high variability of surrounding environments, scarcity of vocal activity resulting in imbalanced datasets, very low signal-to-noise ratio (SNR) due to ambient noise) to its pre-processing (noisy labels due to inter-annotators variability, limited availability of large annotated datasets). 

Transfer learning methods have shown great performances to tackle these issues, leveraging pre-trained models trained on larger and/or more annotated datasets, and fine-tuning them to suit specific detection/classification tasks. These methods make use of pre-learned embeddings (i.e. data representation in the model’s latent space) to process similar new data and classify them into new classes, learned during the fine-tuning phase. 

In this study, we explore the potential interest of Perch (Ghani et al., 2023), a Google model trained on the community-driven database Xeno-Canto which contains thousands of annotated hours of bird songs. Although originally a bird-species classifier, Perch has also proven to be a powerful and versatile embeddings generator, outperforming other models like BirdNet on the Watkins Marine Mammals Sounds Database through linear probing. This finding motivated us to use the embeddings generated by Perch fine-tuned by a Multi-Layer Perceptron (MLP) on the largest labeled underwater acoustic dataset publicly available (Miller et al., 2024), mostly dedicated to low-frequency blue and fin whales vocalizations. To evaluate the generalization ability of this model, we used validation acoustic datasets from different geographical areas and/or from different temporal ranges.

A keypoint of this work is that the main results are based on the largest publicly labeled acoustic database (Miller et al., 2021) and fairly compared with state-of-the-art methods thanks to the first benchmark published along with this dataset (Schall et al., 2024). 

How to cite: Jean-Labadye, L., Dubus, G., Cazau, D., Farrugia, N., and Adam, O.: A little bird told me : transfer learning for automatic detection and classification of blue and fin whales low-frequency vocalizations in the Southern Ocean, One Ocean Science Congress 2025, Nice, France, 3–6 Jun 2025, OOS2025-1575, https://doi.org/10.5194/oos2025-1575, 2025.