TUHOI: Trento Universal Human Object Interaction Dataset

Le, Dieu-Thu; Uijlings, Jasper; Bernardi, Raffaella

doi:10.3115/v1/w14-5403

Cited by 32 publications

(19 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The events in (i) and (j) that correspond to 'throwing a bag' and 'pushing a bicycle' include the interaction of objects, which could be captured with our approach. Since large datasets for object interactions is available [5,21,24], our framework could be extended to learn such knowledge, and this could be another direction for future work.…”

Section: Qualitative Evaluation Of Evidence Recountingmentioning

confidence: 99%

Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge

Hinami

Mei

Satoh

2017

2017 IEEE International Conference on Computer Vision (ICCV)

226

160

View full text Add to dashboard Cite

This paper addresses the problem of joint detection and recounting of abnormal events in videos. Recounting of abnormal events, i.e., explaining why they are judged to be abnormal, is an unexplored but critical task in video surveillance, because it helps human observers quickly judge if they are false alarms or not. To describe the events in the human-understandable form for event recounting, learning generic knowledge about visual concepts (e.g., object and action) is crucial. Although convolutional neural networks (CNNs) have achieved promising results in learning such concepts, it remains an open question as to how to effectively use CNNs for abnormal event detection, mainly due to the environment-dependent nature of the anomaly detection. In this paper, we tackle this problem by integrating a generic CNN model and environment-dependent anomaly detectors. Our approach first learns CNN with multiple visual tasks to exploit semantic information that is useful for detecting and recounting abnormal events. By appropriately plugging the model into anomaly detectors, we can detect and recount abnormal events while taking advantage of the discriminative power of CNNs. Our approach outperforms the stateof-the-art on Avenue and UCSD Ped2 benchmarks for abnormal event detection and also produces promising results of abnormal event recounting. 1

show abstract

Section: Qualitative Evaluation Of Evidence Recountingmentioning

confidence: 99%

Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge

Hinami

Mei

Satoh

2017

2017 IEEE International Conference on Computer Vision (ICCV)

226

160

View full text Add to dashboard Cite

show abstract

“…Although they also present alternative models that use text corpora for descriptions that are more human-like, they are limited to verbs and do not cover prepositions. Le et al (2014) examine prepositions modifying human actions (verbs), and conclude that these relate to positional information to a certain extent. Other related work include training classifiers for prepositions with spatial relation features to improve image segmentation and detection (Fidler et al, 2013); this work is however limited to four prepositions.…”

Section: Related Workmentioning

confidence: 99%

Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions

Ramisa¹,

Wang²,

Lü³

et al. 2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We investigate the role that geometric, textual and visual features play in the task of predicting a preposition that links two visual entities depicted in an image. The task is an important part of the subsequent process of generating image descriptions. We explore the prediction of prepositions for a pair of entities, both in the case when the labels of such entities are known and unknown. In all situations we found clear evidence that all three features contribute to the prediction task.

show abstract

“…As a secondary evaluation dataset for the Multimodal Translation task, we collected and translated a set of image descriptions that potentially contain ambiguous verbs. We based our selection on the VerSe dataset (Gella et al, 2016), which annotates a subset of the COCO (Lin et al, 2014) and TUHOI (Le et al, 2014) images with OntoNotes senses for 90 verbs which are ambiguous, e.g. play.…”

Section: Ambiguous Cocomentioning

confidence: 99%

Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description

Elliott¹,

Frank²,

Barrault³

et al. 2017

Proceedings of the Second Conference on Machine Translation

143

104

View full text Add to dashboard Cite

We present the results from the second shared task on multimodal machine translation and multilingual image description. Nine teams submitted 19 systems to two tasks. The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets. The multilingual image description task was changed such that at test time, only the image is given. Compared to last year, multimodal systems improved, but text-only systems remain competitive.

show abstract

TUHOI: Trento Universal Human Object Interaction Dataset

Cited by 32 publications

References 16 publications

Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge

Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge

Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions

Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description

Contact Info

Product

Resources

About