WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks

Durand, Thibaut; Thome, Nicolas; Cord, Matthieu

doi:10.1109/cvpr.2016.513

Cited by 138 publications

(140 citation statements)

References 46 publications

Supporting

Mentioning

138

Contrasting

Order By: Relevance

“…R-CNN [21], or scene understanding [27,25,43,12]. Since this approach is highly inefficient, there have been extensive attempts for using convolutional layers to share feature computation, for image classification [44,13,70], object detection [22,20,52] or image segmentation [8,42]. However, fully connected layers are beneficial in standard deep architectures, e.g.…”

Section: Related Workmentioning

confidence: 99%

“…The standard max-pooling MIL approach [44] is obtained with only one element, and both top instance model [39], Learning with Label Proportion [65] and global average pooling [70] can be obtained with more. Drawing from negative evidence [47,12,13] we can incorporate minimum scoring regions to support classification and our spatial pooling function can reduce to the kMax+kMin layer of [13].…”

Section: Wildcat Poolingmentioning

confidence: 99%

“…Recent alternatives include Global Average Pooling (GAP) [70], soft max in LSE pooling [58], Learning from Label Proportion (LLP) [65,36], and top max scoring [39]. Negative evidence models [47,12,13] explicitly select regions accounting for the absence of the class. In WILDCAT, we propose to incorporate negative evidence insights, but with a differentiate positive and negative contribution process.…”

Section: Related Workmentioning

confidence: 99%

“…To optimally perform domain adaptation in this context, it becomes necessary to align informative image regions, e.g. by detecting objects [44,29] parts [68,69,70,35] or context [23,13]. Although some works incorporate more precise annotations during training, e.g.…”

Section: Introductionmentioning

confidence: 99%

“…We propose a new pooling strategy (right of Figure 2) which generalizes several approaches in the literature, including (top) max pooling [44,39], global average pooling [70] or negative evidence models [47,12,13].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation

Durand

Mordan

Thome

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

313

285

View full text Add to dashboard Cite

This paper introduces WILDCAT, a deep learning method which jointly aims at aligning image regions for gaining spatial invariance and learning strongly localized features. Our model is trained using only global image labels and is devoted to three main visual recognition tasks: image classification, weakly supervised pointwise object localization and semantic segmentation. WILDCAT extends state-of-the-art Convolutional Neural Networks at three major levels: the use of Fully Convolutional Networks for maintaining spatial resolution, the explicit design in the network of local features related to different class modalities, and a new way to pool these features to provide a global image prediction required for weakly supervised training. Extensive experiments show that our model significantly outperforms the state-of-the-art methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Wildcat Poolingmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation

Durand

Mordan

Thome

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

313

285

View full text Add to dashboard Cite

show abstract

W-TALC: Weakly-Supervised Temporal Activity Localization and Classification

Shrivastava

Roy

Roy-Chowdhury

2018

Lecture Notes in Computer Science

276

335

View full text Add to dashboard Cite

Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. The proposed network can be divided into two sub-networks, namely the Two-Stream based feature extractor network and a weakly-supervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets -Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current state-of-the-art methods. Codes available at https://github.com/sujoyp/wtalc-pytorch

show abstract