2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.200
|View full text |Cite
|
Sign up to set email alerts
|

Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection

Abstract: In this paper, we investigate a weakly-supervised object detection framework. Most existing frameworks focus on using static images to learn object detectors. However, these detectors often fail to generalize to videos because of the existing domain shift. Therefore, we investigate learning these detectors directly from boring videos of daily activities. Instead of using bounding boxes, we explore the use of action descriptions as supervision since they are relatively easy to gather. A common issue, however, i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
53
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 82 publications
(54 citation statements)
references
References 45 publications
0
53
0
Order By: Relevance
“…There are on average 6.8 action labels for a video. The official Charades dataset doesn't provide object bounding box annotations and we use the annotations released by [50]. In the released annotations, 1,812 test videos are down-sampled to 1 frame per second (fps) and 17 object classes are labeled with bounding boxes on these frames.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…There are on average 6.8 action labels for a video. The official Charades dataset doesn't provide object bounding box annotations and we use the annotations released by [50]. In the released annotations, 1,812 test videos are down-sampled to 1 frame per second (fps) and 17 object classes are labeled with bounding boxes on these frames.…”
Section: Methodsmentioning
confidence: 99%
“…We report per-class average precision (AP) at intersection-over-union (IoU) of 0.5 between detection and ground truth boxes, and also mean AP (mAP) as a combined metric, following the tradition of [50]. We also report CorLoc [9], a commonly-used weakly supervised detection metric.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations