An Auto Encoder For Audio Dolphin Communication

Kohlsdorf, Daniel; Herzing, Denise L.; Starner﻿﻿, Thad

doi:10.1109/ijcnn48605.2020.9207262

Cited by 11 publications

(10 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several recent approaches offer interesting alternatives or potential improvements. The direct use of spectrograms, either as an image or as a parameter matrix, has already been applied to mice (Premoli, et al, 2021), Atlantic spotted dolphins (Kohlsdorf, Herzing, & Starner, 2020), domestic cats (Pandeya, Kim, & Lee, 2018), common marmosets (Oikarinen, et al, 2019), etc.. However, their performance on complex and graded repertoires remain to be evaluated, and adaptation of spectrogram parameters to each species may be necessary (Knight, et al, 2020).…”

Section: Future Workmentioning

confidence: 99%

Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: the case of bonobo calls

Arnaud

Pellegrino

Keenan

et al. 2022

Preprint

View full text Add to dashboard Cite

Despite the accumulation of data and studies, deciphering animal vocal communication remains highly challenging. While progress has been made with some species for which we now understand the information exchanged through vocal signals, researchers are still left struggling with sparse recordings composing Small, Unbalanced, Noisy, but Genuine (SUNG) datasets. SUNG datasets offer a valuable but distorted vision of communication systems. Adopting the best practices in their analysis is therefore essential to effectively extract the available information and draw reliable conclusions. Here we show that the most recent advances in machine learning applied to a SUNG dataset succeed in unraveling the complex vocal repertoire of the bonobo, and we propose a workflow that can be effective with other animal species. We implement acoustic parameterization in three feature spaces along with three classification algorithms (Support Vector Machine, xgboost, neural networks) and their combination to explore the structure and variability of bonobo calls, as well as the robustness of the individual signature they encode. We underscore how classification performance is affected by the feature set and identify the most informative features. We highlight the need to address data leakage in the evaluation of classification performance to avoid misleading interpretations. Finally, using a Uniform Manifold Approximation and Projection (UMAP), we show that classifiers generate parsimonious data descriptions which help to understand the clustering of the bonobo acoustic space. Our results lead to identifying several practical approaches that are generalizable to any other animal communication system. To improve the reliability and replicability of vocal communication studies with SUNG datasets, we thus recommend: i) comparing several acoustic parameterizations; ii) adopting Support Vector Machines as the baseline classification approach; iii) explicitly evaluating data leakage and possibly implementing a mitigation strategy; iv) visualizing the dataset with UMAPs applied to classifier predictions rather than to raw acoustic features.

show abstract

Section: Future Workmentioning

confidence: 99%

Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: the case of bonobo calls

Arnaud

Pellegrino

Keenan

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The most common parametric dimensionality reduction algorithm is PCA, where a linear transform is learned between data and an embedding space. Similarly, neural networks such as autoencoders can be used to learn a set of basis features which can be complex and non-linear (Kohlsdorf et al, 2020 ; Sainburg et al, 2020c ; Goffinet et al, 2021 ; Singh Alvarado et al, 2021 ). For example, an autoencoder trained on images of faces can learn to linearize the presence of glasses or a beard (Radford et al, 2015 ; Sainburg et al, 2018b , 2021 ).…”

Section: Extracting Relational Structure and Clusteringmentioning

confidence: 99%

“…Like most areas of deep learning, substantial progress has been made on the task of audio synthesis in the past few years. Basic methods comprise autoencoders (Engel et al, 2017 ; Kohlsdorf et al, 2020 ; Sainburg et al, 2020c ), Generative Adversarial Networks (GANd) (Donahue et al, 2018 ; Engel et al, 2019 ; Sainburg et al, 2020c ; Tjandra et al, 2020 ; Pagliarini et al, 2021 ) and autoregressive approaches (Mehri et al, 2016 ; Oord et al, 2016 ; Kalchbrenner et al, 2018 ; Prenger et al, 2019 ). One advantage of GAN-based models is that their loss is not defined directly by reconstruction loss, resulting in higher-fidelity syntheses (Larsen et al, 2016 ).…”

Section: Synthesizing Vocalizationsmentioning

confidence: 99%

Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions

Sainburg

Gentner

2021

Front. Behav. Neurosci.

View full text Add to dashboard Cite

Recently developed methods in computational neuroethology have enabled increasingly detailed and comprehensive quantification of animal movements and behavioral kinematics. Vocal communication behavior is well poised for application of similar large-scale quantification methods in the service of physiological and ethological studies. This review describes emerging techniques that can be applied to acoustic and vocal communication signals with the goal of enabling study beyond a small number of model species. We review a range of modern computational methods for bioacoustics, signal processing, and brain-behavior mapping. Along with a discussion of recent advances and techniques, we include challenges and broader goals in establishing a framework for the computational neuroethology of vocal communication.

show abstract

“…Following the success of Deep Learning, there has been a growing usage of neural network architectures for AAD. In particular, Au-toEncoders (AE) are becoming popular for unsupervised AAD [15,24]. When compared with other ML approaches (e.g., IF and OCSVM), AE present the advantage of requiring a lower computational effort [19].…”

Section: Introductionmentioning

confidence: 99%

Deep Dense and Convolutional Autoencoders for Machine Acoustic Anomaly Detection

Coelho

Pereira

Matos

et al. 2021

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Recently, there have been advances in using unsupervised learning methods for Acoustic Anomaly Detection (AAD). In this paper, we propose an improved version of two deep AutoEncoders (AE) for unsupervised AAD for six types of working machines, namely Dense and Convolutional AEs. A large set of computational experiments was held, showing that the two proposed deep autoencoders, when combined with a mel-spectrogram sound preprocessing, are quite competitive and outperform a recently proposed AE baseline. Overall, a high-quality class discrimination level was achieved, ranging from 72% to 92%.

show abstract

An Auto Encoder For Audio Dolphin Communication

Cited by 11 publications

References 12 publications

Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: the case of bonobo calls

Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: the case of bonobo calls

Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions

Deep Dense and Convolutional Autoencoders for Machine Acoustic Anomaly Detection

Contact Info

Product

Resources

About