Automated annotation of birdsong with a neural network that segments spectrograms

Cohen, Yarden; Nicholson, David; Sanchioni, Alexa; Mallaber, Emily K; Skidanova, Viktoriya; Gardner, Timothy J.

doi:10.7554/elife.63853

Cited by 47 publications

(56 citation statements)

References 84 publications

(132 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We did so to account for potential circadian effects on song production. We also reassessed our results (shown in Figure 2 ) by analyzing only syllable renditions produced between 6 PM and 8 PM using new methods for automated labeling of song syllables ( Cohen et al, 2022 ). We found no statistically significant difference in learning magnitudes between the two forms of analysis ( Figure 2—figure supplement 7a , 0.167 < P boot < 0.951 on all days of training).…”

Section: Methodsmentioning

confidence: 81%

Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain

McGregor

Grassler

Jaffe

et al. 2022

eLife

View full text Add to dashboard Cite

Songbirds and humans share the ability to adaptively modify their vocalizations based on sensory feedback. Prior studies have focused primarily on the role that auditory feedback plays in shaping vocal output throughout life. In contrast, it is unclear how non-auditory information drives vocal plasticity. Here, we first used a reinforcement learning paradigm to establish that somatosensory feedback (cutaneous electrical stimulation) can drive vocal learning in adult songbirds. We then assessed the role of a songbird basal ganglia thalamocortical pathway critical to auditory vocal learning in this novel form of vocal plasticity. We found that both this circuit and its dopaminergic inputs are necessary for non-auditory vocal learning, demonstrating that this pathway is critical for guiding adaptive vocal changes based on both auditory and somatosensory signals. The ability of this circuit to use both auditory and somatosensory information to guide vocal learning may reflect a general principle for the neural systems that support vocal plasticity across species.

show abstract

Section: Methodsmentioning

confidence: 81%

Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain

McGregor

Grassler

Jaffe

et al. 2022

eLife

View full text Add to dashboard Cite

show abstract

“…We arbitrarily chose a subset of recordings to annotate from each species, starting with Zonotrichia leucophrys which had the largest dataset, and moving down to Calypte anna which had the smallest dataset. As few as three minutes of recording is sufficient for the TweetyNet algorithm to perform accurately [50], so we ensured that all species had at least 180 seconds worth of annotated syllables. We then added more recordings for species with more data to investigate the influence of sample size.…”

Section: Methodsmentioning

confidence: 99%

“…With respect to bioacoustics, an extractive algorithm could, for example, segment out syllables within vocalizations. A recently developed application, TweetyNet, was released to perform just this task [50,85,86] using deep learning via ANNs. Specifically, TweetyNet uses convolutional and recurrent ANNs.…”

Section: Avian Bioacousticsmentioning

confidence: 99%

“…Though the sample size does not appear to impact accuracy, it is an important consideration when trying to decide how to best extract data from species. With sufficiently large sample sizes (TweetyNet suggests 3 minutes of annotation, [50]), techniques like convolutional neural networks will work effectively to learn and parse the diversity of syllables and sounds present in these data. However, for some taxa the available data are not sufficient for these types of learning.…”

Section: Machine Learning In Bioacoustics 22mentioning

confidence: 99%

“…Although challenges remain with respect to scalability, computational efficiency, and how to handle depauperate data [42], deep learning is one of the most powerful analytical tools in the modern researcher’s toolbox, particularly when human knowledge is lacking, or datasets are too large to be workable by traditional means. In the context of ecology and evolutionary biology, there have been many recent applications of both shallow and deep machine learning, including population genetics and phylogeography [e.g., 46, 47], bioacoustics [e.g., 48, 49, 50], species classification [e.g., 51], phylogenetics [e.g., 52, 53], sequencing and genomics [e.g., 54, 55], and phenotypic analyses and morphometrics [e.g., 56, 57]. Neural networks and support vector machines tend to be the most applied algorithms towards these analyses.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

The impacts of fine-tuning, phylogenetic distance, and sample size on big-data bioacoustics

Provost

Yang

Carstens

2022

Preprint

View full text Add to dashboard Cite

Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness, but automatically extracting vocalization data from existing large databases has only recently gained traction and has yet to be evaluated with respect to accuracy of different approaches. Here, we use a recently-published machine learning framework to extract syllables from six bird species ranging in their phylogenetic relatedness from 1-85 million years, comparing how phylogenetic relatedness impacts accuracy as well as the utility of applying trained models to novel species. Model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa; however, using models trained on multiple distantly related species can recover the lost accuracy. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.

show abstract

callsync: An R package for alignment and analysis of multi‐microphone animal recordings

Smeele,

Tyndel,

Klump

et al. 2024

Ecology and Evolution

View full text Add to dashboard Cite

To better understand how vocalisations are used during interactions of multiple individuals, studies are increasingly deploying on‐board devices with a microphone on each animal. The resulting recordings are extremely challenging to analyse, since microphone clocks drift non‐linearly and record the vocalisations of non‐focal individuals as well as noise. Here we address this issue with callsync, an R package designed to align recordings, detect and assign vocalisations to the caller, trace the fundamental frequency, filter out noise and perform basic analysis on the resulting clips. We present a case study where the pipeline is used on a dataset of six captive cockatiels (Nymphicus hollandicus) wearing backpack microphones. Recordings initially had a drift of ~2 min, but were aligned to within ~2 s with our package. Using callsync, we detected and assigned 2101 calls across three multi‐hour recording sessions. Two had loud beep markers in the background designed to help the manual alignment process. One contained no obvious markers, in order to demonstrate that markers were not necessary to obtain optimal alignment. We then used a function that traces the fundamental frequency and applied spectrographic cross correlation to show a possible analytical pipeline where vocal similarity is visually assessed. The callsync package can be used to go from raw recordings to a clean dataset of features. The package is designed to be modular and allows users to replace functions as they wish. We also discuss the challenges that might be faced in each step and how the available literature can provide alternatives for each step.

show abstract

Automated annotation of birdsong with a neural network that segments spectrograms

Cited by 47 publications

References 84 publications

Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain

Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain

The impacts of fine-tuning, phylogenetic distance, and sample size on big-data bioacoustics

callsync: An R package for alignment and analysis of multi‐microphone animal recordings

Contact Info

Product

Resources

About