2018
DOI: 10.3390/app8081397
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

Abstract: In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 41 publications
(38 citation statements)
references
References 25 publications
0
33
0
Order By: Relevance
“…Desjonquères, Rybak, Ulloa, et al., ; Dutilleux & Curé, ) to more refined pattern recognition via cross‐correlation (Ulloa et al., ) as well as more complex methods using machine learning (e.g. Morfi & Stowell, ; Xie, Towsey, Zhang, & Roe, ; Zhang, Towsey, Zhang, & Roe, ). Automatic detection and classification methods usually first involve feature extraction and then classification of these features (Sharan & Moir, ).…”
Section: How To Undertake Pam In Freshwatermentioning
confidence: 99%
“…Desjonquères, Rybak, Ulloa, et al., ; Dutilleux & Curé, ) to more refined pattern recognition via cross‐correlation (Ulloa et al., ) as well as more complex methods using machine learning (e.g. Morfi & Stowell, ; Xie, Towsey, Zhang, & Roe, ; Zhang, Towsey, Zhang, & Roe, ). Automatic detection and classification methods usually first involve feature extraction and then classification of these features (Sharan & Moir, ).…”
Section: How To Undertake Pam In Freshwatermentioning
confidence: 99%
“…A few examples of the NIPS4Bplus dataset and temporal annotations being used can be found in (Morfi and Stowell, 2018a) and (Morfi and Stowell, 2018b). First, in (Morfi and Stowell, 2018a), we use NIPS4Bplus to carry out the training and evaluation of a newly proposed multi-instance learning (MIL) loss function for audio event detection.…”
Section: Example Uses Of Nips4bplusmentioning
confidence: 99%
“…Usually, bird vocalizations are segmented to improve the performance of the classifier. However, these segmentation algorithms are commonly too simple for real conditions in the field or follow a supervised learning scheme were a lot of manual work has to be done to label the vocalizations used for training [4].…”
Section: Introductionmentioning
confidence: 99%