2012
DOI: 10.21236/ada570728
|View full text |Cite
|
Sign up to set email alerts
|

Mid-level Features Improve Recognition of Interactive Activities

Abstract: We argue that mid-level representations can bridge the gap between existing low-level models, which are incapable of capturing the structure of interactive verbs, and contemporary high-level schemes, which rely on the output of potentially brittle intermediate detectors and trackers. We develop a novel descriptor based on generic object foreground segments; our representation forms a histogram-of-gradient representation that is grounded to the frame of detected key-segments. Importantly, our method does not re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
3
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 44 publications
1
3
0
Order By: Relevance
“…However, with the similarity constraints added, our full model dramatically improves on the novel target categories. This again validates our argument that the auxiliary similarity [25] {10, 20, 30, 40} Table 4. PASCAL to VisInt dataset description constraints can be used in conjunction with a domain adaptation algorithm to learn a more generalizable target model.…”
Section: Results and Analysissupporting
confidence: 87%
See 1 more Smart Citation
“…However, with the similarity constraints added, our full model dramatically improves on the novel target categories. This again validates our argument that the auxiliary similarity [25] {10, 20, 30, 40} Table 4. PASCAL to VisInt dataset description constraints can be used in conjunction with a domain adaptation algorithm to learn a more generalizable target model.…”
Section: Results and Analysissupporting
confidence: 87%
“…Our experiments focus on person detectors, due to the wide interest in and broad applications of pedestrian detection. The source domain has images from the PASCAL VOC 2007 dataset [10], and the target domain consists of frames of the videos from the VisInt dataset [25].…”
Section: Object Detection In Videomentioning
confidence: 99%
“…It is a challenge to obtain high-level descriptions from videos, or to combine empirical measurements with expert knowledge and bridge the gap between low-level features and high-level descriptions. Saenko [40] proposed a mid-level representations, that can bridge the gap between existing low-level models, which are incapable of capturing the structure of interactive verbs, and contemporary high-level schemes, which rely on the output of potentially brittle intermediate detectors and trackers. Sadanand [39] presented Action Bank, a high-level representation of video.…”
Section: Literature Surveymentioning
confidence: 99%
“…It seems intuitively clear that machine vision models should also capture the above reasoning structure, and indeed this has been explored in the past (Gupta & Davis, 2007;Saenko et al, 2012). However, current state-of-the-art video transformer models do not explicitly model objects.…”
Section: Introductionmentioning
confidence: 99%