2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.513
|View full text |Cite
|
Sign up to set email alerts
|

WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks

Abstract: Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. This task encourages research efforts to better analyze and understand the content of the huge amounts of audio data on the web. The difficulty in audio tagging is that it only has a chunk-level label without a frame-level label. This paper presents a weakly supervised method to not only predict the tags but also … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
138
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 138 publications
(140 citation statements)
references
References 46 publications
0
138
0
Order By: Relevance
“…R-CNN [21], or scene understanding [27,25,43,12]. Since this approach is highly inefficient, there have been extensive attempts for using convolutional layers to share feature computation, for image classification [44,13,70], object detection [22,20,52] or image segmentation [8,42]. However, fully connected layers are beneficial in standard deep architectures, e.g.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…R-CNN [21], or scene understanding [27,25,43,12]. Since this approach is highly inefficient, there have been extensive attempts for using convolutional layers to share feature computation, for image classification [44,13,70], object detection [22,20,52] or image segmentation [8,42]. However, fully connected layers are beneficial in standard deep architectures, e.g.…”
Section: Related Workmentioning
confidence: 99%
“…The standard max-pooling MIL approach [44] is obtained with only one element, and both top instance model [39], Learning with Label Proportion [65] and global average pooling [70] can be obtained with more. Drawing from negative evidence [47,12,13] we can incorporate minimum scoring regions to support classification and our spatial pooling function can reduce to the kMax+kMin layer of [13].…”
Section: Wildcat Poolingmentioning
confidence: 99%
See 3 more Smart Citations