2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00978
|View full text |Cite
|
Sign up to set email alerts
|

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

Abstract: Learning to localize and name object instances is a fundamental problem in vision, but state-of-the-art approaches rely on expensive bounding box supervision. While weakly supervised detection (WSOD) methods relax the need for boxes to that of image-level annotations, even cheaper supervision is naturally available in the form of unstructured textual descriptions that users may freely provide when uploading image content. However, straightforward approaches to using such data for WSOD wastefully discard captio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
47
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 49 publications
(48 citation statements)
references
References 37 publications
1
47
0
Order By: Relevance
“…Note that the motivation of our proposed pseudo-supervised learning is conceptually different from the weakly supervised learning methods [31]- [35]. Weakly supervised learning methods are mostly used in object detection where manually annotated labels are heavily required and costly.…”
Section: Learning Frameworkmentioning
confidence: 99%
“…Note that the motivation of our proposed pseudo-supervised learning is conceptually different from the weakly supervised learning methods [31]- [35]. Weakly supervised learning methods are mostly used in object detection where manually annotated labels are heavily required and costly.…”
Section: Learning Frameworkmentioning
confidence: 99%
“…Recent studies have also explored the related task of weakly supervised object detection (WSOD) using only image captions as supervision [44], [45]. Similar in spirit to our proposed caption processing module, these studies have ex-plored methods for extracting useful visual information explicitly from captions as a structured set of labels.…”
Section: Weakly Supervised Object Detection Using Image Captionsmentioning
confidence: 99%
“…TAM-NET [40] utilizes text to generate text activation maps, which can be used for augmenting class activation map in segmentation task. Cap2Det [52] leverages the signal that captions provide for weakly supervised detection. However, caption-enhanced image segmentation models are still inadequately explored in the literature.…”
Section: Related Workmentioning
confidence: 99%
“…Let us first briefly summarize the manipulation of image captions in most relevant works Cap2Det [52] or TAM [40] for image segmentation or detection. The input to caption processor is obtained by encoding each word with a word2vec model and average pooling over words, often intertwined with fully-connected layers for fine-tuning.…”
Section: Visual Occurrence Estimation By Contextual Entailmentmentioning
confidence: 99%
See 1 more Smart Citation