2020
DOI: 10.1016/j.ecoinf.2020.101084
|View full text |Cite
|
Sign up to set email alerts
|

Data augmentation approaches for improving animal audio classification

Abstract: In this paper we present ensembles of classifiers for automated animal audio classification, exploiting different data augmentation techniques for training Convolutional Neural Networks (CNNs). The specific animal audio classification problems are i) birds and ii) cat sounds, whose datasets are freely available. We train five different CNNs on the original datasets and on their versions augmented by four augmentation protocols, working on the raw audio signals or their representations as spectrograms. We compa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
74
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 120 publications
(82 citation statements)
references
References 41 publications
0
74
0
1
Order By: Relevance
“…This idea was extended by introducing more policies such as vertical direction distortion (frequency warping), time length control, and loudness control ( Hwang et al., 2020 ). Other techniques used in image augmentation, e.g., rotation and mixture, are attempted as well ( Nanni et al., 2020 ).…”
Section: Methods To Integrate Human Knowledgementioning
confidence: 99%
“…This idea was extended by introducing more policies such as vertical direction distortion (frequency warping), time length control, and loudness control ( Hwang et al., 2020 ). Other techniques used in image augmentation, e.g., rotation and mixture, are attempted as well ( Nanni et al., 2020 ).…”
Section: Methods To Integrate Human Knowledgementioning
confidence: 99%
“…Recently, some methods have been proposed for audio data augmentation in emotional classification [44] and speech recognition [37], [45]. These data augmentation types are based on the spectrogram of the audio signals [46].…”
Section: B Data Augmentationmentioning
confidence: 99%
“…1-b) [43,48,49]. Time masking: t consecutive time steps are masked, replaced by a minimum intensity, at a random point along the spectrogram's time axis in the range of   00 t , t t  , where 0 t is chosen from   0, N t  randomly [45,46]. The vertical black strip with a bandwidth of t in Fig.…”
Section: Time Warpingmentioning
confidence: 99%
“…A convolutional neural network (CNN) is a deep learning technology in which a data array of two or more dimensions, such as an image, is stacked through a plurality of two-dimensional filters. CNNs show high accuracies in image classification and have been recently applied in speech classification [ 25 , 26 , 27 ]. For animal sound classification using CNNs, Xie and Zhu [ 28 ] applied deep learning in classifying Australian bird sounds and reported a classification accuracy of more than 88%.…”
Section: Introductionmentioning
confidence: 99%