2016
DOI: 10.1007/978-3-319-46448-0_31
|View full text |Cite
|
Sign up to set email alerts
|

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

Abstract: Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So how do we collect sufficiently many diverse but boring samples representing our lives? We propose a novel Hollywood … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
442
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 675 publications
(446 citation statements)
references
References 35 publications
3
442
0
1
Order By: Relevance
“…Thus we believe that our problem is a natural reflection of the kinds of learning that people employ to learn to recognize newly named objects. Contemporary to our work, Sigurdsson et al (2016) proposed an interesting new dataset Charades in which hundreds of people record videos in their homes acting casual every activities. We leave the application of our method to this dataset for future work.…”
Section: Discussionmentioning
confidence: 99%
“…Thus we believe that our problem is a natural reflection of the kinds of learning that people employ to learn to recognize newly named objects. Contemporary to our work, Sigurdsson et al (2016) proposed an interesting new dataset Charades in which hundreds of people record videos in their homes acting casual every activities. We leave the application of our method to this dataset for future work.…”
Section: Discussionmentioning
confidence: 99%
“…To evaluate the effectiveness of our temporal reasoning graph, we perform extensive experiments on three benchmark datasets for activity recognition: Something-Something V1 [9], V2 [16] and Charades [29]. We first introduce the two datasets and implementation detail.…”
Section: Methodsmentioning
confidence: 99%
“…Charades [25]: is an untrimmed and multi-action dataset, containing 11,848 videos split into 7985 for training, 1863 for validation, and 2,000 for testing. It has 157 action categories, with several fine-grained categories.…”
Section: Datasetsmentioning
confidence: 99%
“…In the classification task, we concatenate the two-stream features and apply a sliding window pooling scheme to create multiple descriptors. Following the evaluation protocol in [25], we use the output probability of the classifier to be the score of the sequence. In the detection task, we consider the evaluation method with post-processing proposed in [81], which uses the averaged prediction score of a temporal window around each temporal pivots.…”
Section: Action Recognition/detection In Untrimmed Videosmentioning
confidence: 99%