2018 IEEE Winter Conference on Applications of Computer Vision (WACV) 2018
DOI: 10.1109/wacv.2018.00181
|View full text |Cite
|
Sign up to set email alerts
|

Scaling Human-Object Interaction Recognition Through Zero-Shot Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
130
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 146 publications
(130 citation statements)
references
References 17 publications
0
130
0
Order By: Relevance
“…However, with the introduction of datasets with a larger vocabulary of objects and predicates [6,23], visual phrase approaches have been facing severe difficulties as most relations have very few training examples. Compositional methods [9,11,17,27,30,33,42], which allow sharing knowledge across triplets, have scaled better but do not cope well with unseen relations. To increase the expressiveness of the generic compositional detectors, recent works have developed models of statistical dependencies between the subject, object and predicate, using, for example, graphical models [7,24], language distillation [45], or semantic context [48].…”
Section: Related Workmentioning
confidence: 99%
“…However, with the introduction of datasets with a larger vocabulary of objects and predicates [6,23], visual phrase approaches have been facing severe difficulties as most relations have very few training examples. Compositional methods [9,11,17,27,30,33,42], which allow sharing knowledge across triplets, have scaled better but do not cope well with unseen relations. To increase the expressiveness of the generic compositional detectors, recent works have developed models of statistical dependencies between the subject, object and predicate, using, for example, graphical models [7,24], language distillation [45], or semantic context [48].…”
Section: Related Workmentioning
confidence: 99%
“…Chao et al [17] set the benchmark in HICO-DET based on a three-stream detection framework, exploiting the visual and spatial representations of human, object and the pairwise bounding box. Shen et al [32] analyzed the zero-shot problem with separate verb and object detection losses. Zhuang et al [23] addressed the long-tail issue with supervision from web data.…”
Section: Related Workmentioning
confidence: 99%
“…Desai and Ramanan (2012) propose a compositional model that uses human pose and interacting objects to predict human actions, but the visual phraselets and tree structure they use are too simple to capture sophisticated HOI relations in large datasets. In connection with neural networks, Shen et al (2018) ...…”
Section: Combination Of Action Recognition and Pose Estimationmentioning
confidence: 99%
“…The proposed method achieves the state-of-the-art results on two public benchmarks including V-COCO and HICO-DET use spatial relations between human and object positions to recognize HOIs. Shen et al (2018) focus on the difficulty of obtaining all the possible HOI samples in reality, and propose a zero-shot learning method to tackle with the lack of data problem.…”
Section: Introductionmentioning
confidence: 99%