2019
DOI: 10.1121/1.5087827
|View full text |Cite|
|
Sign up to set email alerts
|

Deep convolutional network for animal sound classification and source attribution using dual audio recordings

Abstract: This paper introduces an end-to-end feedforward convolutional neural network that is able to reliably classify the source and type of animal calls in a noisy environment using two streams of audio data after being trained on a dataset of modest size and imperfect labels. The data consists of audio recordings from captive marmoset monkeys housed in pairs, with several other cages nearby. The network in this paper can classify both the call type and which animal made it with a single pass through a single networ… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
42
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(44 citation statements)
references
References 23 publications
(41 reference statements)
2
42
0
Order By: Relevance
“…Wave files were manually aligned type in Audacity ® (v. 2.1.0) software. The data was partially annotated by hand, and partially annotated using an auto-detection algorithm (20). Annotations included call start time, call end time, call type, and caller ID (animal A, animal B, or other).…”
Section: Methodsmentioning
confidence: 99%
“…Wave files were manually aligned type in Audacity ® (v. 2.1.0) software. The data was partially annotated by hand, and partially annotated using an auto-detection algorithm (20). Annotations included call start time, call end time, call type, and caller ID (animal A, animal B, or other).…”
Section: Methodsmentioning
confidence: 99%
“…In terms of animal voice classification, Zhang et al [39], Oikarinen et al [40] study animal voice classification using deep learning techniques. Our method is different from these two works.…”
Section: Related Workmentioning
confidence: 99%
“…Our studies focus on voice classification in noisy environment while the voice data in [39] are collected from controlled room without environmental noise. Instead of classifying different animals, [40] analyses different call types of marmoset monkeys such as Trill, Twitter, Phee and Chatter. Moreover, we implement the proposed system on a testbed and evaluate its performance in real world environment.…”
Section: Related Workmentioning
confidence: 99%
“…A convolutional neural network (CNN) is a deep learning technology in which a data array of two or more dimensions, such as an image, is stacked through a plurality of two-dimensional filters. CNNs show high accuracies in image classification and have been recently applied in speech classification [ 25 , 26 , 27 ]. For animal sound classification using CNNs, Xie and Zhu [ 28 ] applied deep learning in classifying Australian bird sounds and reported a classification accuracy of more than 88%.…”
Section: Introductionmentioning
confidence: 99%