2020
DOI: 10.1371/journal.pcbi.1008228
|View full text |Cite
|
Sign up to set email alerts
|

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires

Abstract: Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species’ vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
216
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 194 publications
(217 citation statements)
references
References 92 publications
1
216
0
Order By: Relevance
“…This data-driven approach is closely related to previous studies that have applied dimensionality reduction algorithms (UMAP [ McInnes et al, 2018 ] and t-SNE [ Lvd and Hinton, 2008 ]) to spectrograms to aid in syllable clustering of birdsong ( Sainburg et al, 2019 ) and visualize juvenile song learning in the zebra finch ( Kollmorgen et al, 2020 ). Additionally, a related recent publication ( Sainburg et al, 2020 ) similarly described the application of UMAP to vocalizations of several more species and the application of the VAE to generate interpolations between birdsong syllables for use in playback experiments. Here, by contrast, we restrict use of the UMAP and t-SNE dimensionality reduction algorithms to visualizing latent spaces inferred by the VAE and use the VAE as a general-purpose tool for quantifying vocal behavior, with a focus on cross-species comparisons and assessing variability across groups, individuals, and experimental conditions.…”
Section: Discussionmentioning
confidence: 99%
“…This data-driven approach is closely related to previous studies that have applied dimensionality reduction algorithms (UMAP [ McInnes et al, 2018 ] and t-SNE [ Lvd and Hinton, 2008 ]) to spectrograms to aid in syllable clustering of birdsong ( Sainburg et al, 2019 ) and visualize juvenile song learning in the zebra finch ( Kollmorgen et al, 2020 ). Additionally, a related recent publication ( Sainburg et al, 2020 ) similarly described the application of UMAP to vocalizations of several more species and the application of the VAE to generate interpolations between birdsong syllables for use in playback experiments. Here, by contrast, we restrict use of the UMAP and t-SNE dimensionality reduction algorithms to visualizing latent spaces inferred by the VAE and use the VAE as a general-purpose tool for quantifying vocal behavior, with a focus on cross-species comparisons and assessing variability across groups, individuals, and experimental conditions.…”
Section: Discussionmentioning
confidence: 99%
“…This data-driven approach is closely related to previous studies that have applied dimensionality reduction algorithms (UMAP [31] and t-SNE [29]) to spectrograms to aid in syllable clustering of birdsong [42] and to visualize juvenile song learning in the zebra finch [26]. Additionally, a related recent publication [43] similarly described the application of UMAP to vocalizations of several more species and the application of the VAE to generate interpolations between birdsong syllables for use in playback experiments. Here, by contrast, we restrict use of the UMAP and t-SNE dimensionality reduction algorithms to visualizing latent spaces inferred by the VAE and use the VAE as a general-purpose tool for quantifying vocal behavior, with a focus on cross-species comparisons and assessing variability across groups, individuals, and experimental conditions.…”
Section: Discussionmentioning
confidence: 84%
“…Traditionally, syllable types are annotated based on statistics derived from the segmented syllable spectrogram. Recently, good annotation performance has been achieved with unsupervised methods ( Sainburg et al, 2020 ; Goffinet et al, 2021 ) and deep neural networks ( Koumura and Okanoya, 2016 ; Cohen et al, 2020 ). We first trained DAS to annotate the song from four male Bengalese finches (data and annotations from Nicholson et al (2017) ).…”
Section: Resultsmentioning
confidence: 99%
“…These methods are not only fast and accurate but also easily adapted to novel signals by non-experts since they only require annotated examples for learning. Recently, deep neural networks have also been used for annotating animal vocalizations ( Oikarinen et al, 2019 ; Coffey et al, 2019 ; Cohen et al, 2020 ; Sainburg et al, 2020 ; Arthur et al, 2021 ; Goffinet et al, 2021 ).…”
Section: Introductionmentioning
confidence: 99%