2018
DOI: 10.1007/978-3-030-01246-5_24
|View full text |Cite
|
Sign up to set email alerts
|

Zero-Shot Object Detection

Abstract: We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training. We work with a challenging set of object classes, not restricting ourselves to similar and/or fine-grained categories as in prior works on zero-shot classification. We present a principled approach by first adapting visual-semantic embeddings for ZSD. We then discuss the problems associated with selecting a background class and motivate two background-aware approa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
348
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 253 publications
(350 citation statements)
references
References 71 publications
2
348
0
Order By: Relevance
“…Our empirical results indicate that such an enforcement of the proper grounding of all phrases via caption-conditioned image representations ( Figure 2) does indeed lead to a better phrase localization performance ( Table 3, 4). Moreover, we also show that the proposed discriminative representation allows us to achieve results that are comparable to the state-of-the-art on the downstream image-caption matching task on both COCO and Flickr30k datasets ( Table 2).…”
Section: Introductionsupporting
confidence: 52%
“…Our empirical results indicate that such an enforcement of the proper grounding of all phrases via caption-conditioned image representations ( Figure 2) does indeed lead to a better phrase localization performance ( Table 3, 4). Moreover, we also show that the proposed discriminative representation allows us to achieve results that are comparable to the state-of-the-art on the downstream image-caption matching task on both COCO and Flickr30k datasets ( Table 2).…”
Section: Introductionsupporting
confidence: 52%
“…Corner-Net achieved a 42.1% AP on MS COCO, outperforming all previous one stage detectors; however, the average inference time is 8 Boxes of various sizes and aspect ratios that serve as object candidates. 9 The idea of using keypoints for object detection appeared previously in DeNet [269]. about 4FPS on a Titan X GPU, significantly slower than SSD [175] and YOLO [227].…”
Section: Unified (One Stage) Frameworkmentioning
confidence: 99%
“…Co-Localization: COCO Dataset Creation and Faster-RCNN Training COCO dataset has 80 classes in total. We take the same 17 unseen classes which is used in zero-shot object detection paper [Ref1] and keep remaining 63 classes for training. The training set is constructed using the images in COCO 2017 train set which contain at least one object from the seen classes.…”
Section: Appendix Learning To Find Common Objects Across Few Image Comentioning
confidence: 99%
“…The COCO test set, is built by combining the unused images of the train set and images in COCO validation set which contain at least one object from the unseen classes. Similar to [Ref1], to avoid training the network to classify unseen objects as background, we remove objects from unseen classes from the training images using their ground-truth segmentation masks.…”
Section: Appendix Learning To Find Common Objects Across Few Image Comentioning
confidence: 99%