Proceedings of the Third Workshop on Vision and Language 2014
DOI: 10.3115/v1/w14-5403
|View full text |Cite
|
Sign up to set email alerts
|

TUHOI: Trento Universal Human Object Interaction Dataset

Abstract: This paper describes the Trento Universal Human Object Interaction dataset, TUHOI, which is dedicated to human object interactions in images. 1 Recognizing human actions is an important yet challenging task. Most available datasets in this field are limited in numbers of actions and objects. A large dataset with various actions and human object interactions is needed for training and evaluating complicated and robust human action recognition systems, especially systems that combine knowledge learned from langu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 32 publications
(19 citation statements)
references
References 16 publications
0
17
0
Order By: Relevance
“…The events in (i) and (j) that correspond to 'throwing a bag' and 'pushing a bicycle' include the interaction of objects, which could be captured with our approach. Since large datasets for object interactions is available [5,21,24], our framework could be extended to learn such knowledge, and this could be another direction for future work.…”
Section: Qualitative Evaluation Of Evidence Recountingmentioning
confidence: 99%
“…The events in (i) and (j) that correspond to 'throwing a bag' and 'pushing a bicycle' include the interaction of objects, which could be captured with our approach. Since large datasets for object interactions is available [5,21,24], our framework could be extended to learn such knowledge, and this could be another direction for future work.…”
Section: Qualitative Evaluation Of Evidence Recountingmentioning
confidence: 99%
“…Although they also present alternative models that use text corpora for descriptions that are more human-like, they are limited to verbs and do not cover prepositions. Le et al (2014) examine prepositions modifying human actions (verbs), and conclude that these relate to positional information to a certain extent. Other related work include training classifiers for prepositions with spatial relation features to improve image segmentation and detection (Fidler et al, 2013); this work is however limited to four prepositions.…”
Section: Related Workmentioning
confidence: 99%
“…As a secondary evaluation dataset for the Multimodal Translation task, we collected and translated a set of image descriptions that potentially contain ambiguous verbs. We based our selection on the VerSe dataset (Gella et al, 2016), which annotates a subset of the COCO (Lin et al, 2014) and TUHOI (Le et al, 2014) images with OntoNotes senses for 90 verbs which are ambiguous, e.g. play.…”
Section: Ambiguous Cocomentioning
confidence: 99%