2019
DOI: 10.1121/1.5118245
|View full text |Cite
|
Sign up to set email alerts
|

Deep metric learning for bioacoustic classification: Overcoming training data scarcity using dynamic triplet loss

Abstract: This paper proposes multiscale convolutional neural network (CNN)-based deep metric learning for bioacoustic classification, under low training data conditions. The proposed CNN is characterized by the utilization of four different filter sizes at each level to analyze input feature maps. This multiscale nature helps in describing different bioacoustic events effectively: smaller filters help in learning the finer details of bioacoustic events, whereas, larger filters help in analyzing a larger context leading… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 44 publications
(20 citation statements)
references
References 34 publications
0
19
0
Order By: Relevance
“…In the work of [30], a triplet sampling was used for generating the triplet spectrograms as the input to CNNs: full spectrogram, harmonic-component based spectrogram, and percussive-component spectrogram. Then, a dynamic triplet loss was used for the classification using multi-scale analysis module.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the work of [30], a triplet sampling was used for generating the triplet spectrograms as the input to CNNs: full spectrogram, harmonic-component based spectrogram, and percussive-component spectrogram. Then, a dynamic triplet loss was used for the classification using multi-scale analysis module.…”
Section: Resultsmentioning
confidence: 99%
“…The duration of audio files in CLO-43DS data is different, which cannot be directly used as the input to the CNN. The first method for dealing with the multi-variate varying length audio data is that the signal is repeated from the beginning to force the fixed duration of 2s, which has been used in [30]. The second method is to directly resize the audio image to a fixed size.…”
Section: Time-frequency Representationmentioning
confidence: 99%
“…However, it is a time-consuming and expensive endeavor to obtain a manually labeled dataset in bioacoustics, and it may also be very challenging to collect enough labeled data in practice, especially if a species rarely calls or if a species is rare. Given this scenario, some bioacoustics research works used other techniques in addition to CNN, including transfer learning with fine-tuning 36 – 39 , pseudo-labeling 40 , and using few-shot learning approaches 41 .…”
Section: Related Workmentioning
confidence: 99%
“…However, it is a time-consuming and expensive endeavor to obtain a manually labeled dataset in bioacoustics, and it may also be very challenging to collect enough labeled data in practice, especially if a species rarely calls or if a species is rare. Given this scenario, some bioacoustics research works used other techniques in addition to CNN, including transfer learning with fine-tuning [28][29][30] , pseudo-labeling 31 , and using few-shot learning approaches 32 .…”
Section: Related Workmentioning
confidence: 99%