No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

Gupta, Tanmay; Schwing, Alexander G.; Hoiem, Derek

doi:10.1109/iccv.2019.00977

Cited by 112 publications

(131 citation statements)

References 20 publications

Supporting

Mentioning

131

Contrasting

Order By: Relevance

“…In addition, the approaches in (Gupta, Schwing, and Hoiem 2019) and (Li et al 2019) require pose estimation models too. The numbers listed in table 1 do not count these parameters.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Detecting Human-Object Interactions via Functional Generalization

Bansal

Rambhatla

Shrivastava

et al. 2020

AAAI

110

View full text Add to dashboard Cite

We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner. The proposed model is simple and efficiently uses the data, visual features of the human, relative spatial orientation of the human and the object, and the knowledge that functionally similar objects take part in similar interactions with humans. We provide extensive experimental validation for our approach and demonstrate state-of-the-art results for HOI detection. On the HICO-Det dataset our method achieves a gain of over 2.5% absolute points in mean average precision (mAP) over state-of-the-art. We also show that our approach leads to significant performance gains for zero-shot HOI detection in the seen object setting. We further demonstrate that using a generic object detector, our model can generalize to interactions involving previously unseen objects.

show abstract

“…In addition, the approaches in (Gupta, Schwing, and Hoiem 2019) and (Li et al 2019) require pose estimation models too. The numbers listed in table 1 do not count these parameters.…”

Section: Resultsmentioning

confidence: 99%

“…As mentioned before, V-COCO (Gupta and Malik 2015) is a small dataset and does not provide any insights into the proposed method. In line with recent work (Gupta, Schwing, and Hoiem 2019), we avoid using it.…”

Section: Methodsmentioning

confidence: 92%

Detecting Human-Object Interactions via Functional Generalization

Bansal

Rambhatla

Shrivastava

et al. 2020

AAAI

110

View full text Add to dashboard Cite

show abstract

“…Computer vision (CV)-based human motion modelling and analysis has been extensively researched by the community. But, most of the research can be categorised into pose estimation [160], human-object interaction [63,98], activity/gesture recognition [31,65,113] or human-human interaction [53]. However, comparative analysis of human motion has received relatively less attention from the community.…”

Section: Introductionmentioning

confidence: 99%

A review of computer vision-based approaches for physical rehabilitation and assessment

et al. 2021

View full text Add to dashboard Cite

The computer vision community has extensively researched the area of human motion analysis, which primarily focuses on pose estimation, activity recognition, pose or gesture recognition and so on. However for many applications, like monitoring of functional rehabilitation of patients with musculo skeletal or physical impairments, the requirement is to comparatively evaluate human motion. In this survey, we capture important literature on vision-based monitoring and physical rehabilitation that focuses on comparative evaluation of human motion during the past two decades and discuss the state of current research in this area. Unlike other reviews in this area, which are written from a clinical objective, this article presents research in this area from a computer vision application perspective. We propose our own taxonomy of computer vision-based rehabilitation and assessment research which are further divided into sub-categories to capture novelties of each research. The review discusses the challenges of this domain due to the wide ranging human motion abnormalities and difficulty in automatically assessing those abnormalities. Finally, suggestions on the future direction of research are offered.

show abstract

“…Most existing works on HOI detection [9,11,14,25,36,39,41] treat HOIs as individual interaction categories and focus on mining visual representations of human-object pairs to improve classification performances. Despite previous successes, these conventional Figure 2: Polysemy of action labels.…”

Section: Introductionmentioning

confidence: 99%

“…As an example shown in Figure 2, collocated with different objects, the actual implications of action "ride" are sometimes inconsistent. Such phenomenon brings ambiguities and extra challenges to compositional methods [1,11,14,36].…”

Section: Introductionmentioning

confidence: 99%

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

Liu

Yuan

Chen

2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of ⟨ℎ , , ⟩ in images. Most existing works treat HOIs as individual interaction categories, thus can not handle the problem of long-tail distribution and polysemy of action labels. We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs. Leveraging the compositional and relational peculiarities of HOI labels, we propose ConsNet, a knowledge-aware framework that explicitly encodes the relations among objects, actions and interactions into an undirected graph called consistency graph, and exploits Graph Attention Networks (GATs) to propagate knowledge among HOI categories as well as their constituents. Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities. We extensively evaluate our model on the challenging V-COCO and HICO-DET datasets, and results validate that our approach outperforms stateof-the-arts under both fully-supervised and zero-shot settings. CCS CONCEPTS • Computing methodologies → Activity recognition and understanding; Scene understanding.

show abstract

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

Cited by 112 publications

References 20 publications

Detecting Human-Object Interactions via Functional Generalization

Detecting Human-Object Interactions via Functional Generalization

A review of computer vision-based approaches for physical rehabilitation and assessment

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

Contact Info

Product

Resources

About