Predicting short-term next-active-object through visual attention and hand position

Jiang, Jingjing; Nan, Zhixiong; Chen, Hui; Chen, Shitao; Zheng, Nanning

doi:10.1016/j.neucom.2020.12.069

Cited by 14 publications

(11 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[19] also explored NAO prediction using cues from visual attention and hand position, but by only using a single frame for the prediction. That approach [19] is not able to differentiate between the past or future active object, since it does not account for the temporal information acquired by the videos. Furnari et al [12] also explored the NAO problem by taking into account the active/passive objects definition of [29].…”

Section: Action Anticipation In Egocentric Videosmentioning

confidence: 99%

Anticipating Next Active Objects for Egocentric Videos

Thakur¹,

Beyan²,

Morerio³

et al. 2023

Preprint

View full text Add to dashboard Cite

This paper addresses the problem of anticipating the next-active-object location in the future, for a given egocentric video clip where the contact might happen, before any action takes place. The problem is considerably hard, as we aim at estimating the position of such objects in a scenario where the observed clip and the action segment are separated by the so-called "time to contact" (TTC) segment. Many methods have been proposed to anticipate the action of a person based on previous hand movements and interactions with the surroundings. However, there have been no attempts to investigate the next possible interactable object, and its future location with respect to the first-person's motion and the field-of-view drift during the TTC window. We define this as the task of Anticipating the Next ACTive Object (ANACTO). To this end, we propose a transformerbased self-attention framework to identify and locate the next-active-object in an egocentric clip. We benchmark our method on three datasets: EpicKitchens-100, EGTEA+ and Ego4D. We also provide annotations for the first two datasets. Our approach performs best compared to relevant baseline methods. We also conduct ablation studies to understand the effectiveness of the proposed and baseline methods on varying conditions. Code and ANACTO task annotations will be made available upon paper acceptance.

show abstract

Section: Action Anticipation In Egocentric Videosmentioning

confidence: 99%

Anticipating Next Active Objects for Egocentric Videos

Thakur¹,

Beyan²,

Morerio³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The output is composed of the next action label, a "hotspot" which indicates the area of the object in which there will be a contact, and the hand trajectory. The two egocentric datasets ADL [70] and EPIC-Kitchens [15] have been re-annotaded by [46] to tackle the problem of short-term next-active object detection. They proposed a novel human-centerd approach composed of two pathways: 1) the first pathway generates a human visual attention probability map and 2) the second one generates a human hand position probability map.…”

Section: Next Active Objects Detectionmentioning

confidence: 99%

“…Due to the limited number of public datasets explicitly annotated to study the future intentions of humans, few past works explored the task of predicting the next-active objects considering the first person point of view [32], using both RGB and depth signals [3], focusing on the hands [46] or estimating also the time to contact with the future active objects [41].…”

Section: Next Active Object Annotationsmentioning

confidence: 99%

“…Furthermore, anticipating what a worker will do and which objects he will interact with provides information to improve safety in a factory, for example by notifying the user with an alert in case a dangerous action or interaction is anticipated. Many recent works investigated human behavior understanding considering different tasks such as action recognition [28,85,30,103,48,59], object detection [38,37,77,76], human-object interaction detection [39,43,82,66], action anticipation [31,35,33,68] as well as the detection of the next active objects [3,32,46,41].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain

Ragusa¹,

Furnari²,

Farinella³

2022

Preprint

View full text Add to dashboard Cite

Wearable cameras allow to acquire images and videos from the user's perspective. These data can be processed to understand humans behavior. Despite human behavior analysis has been thoroughly investigated in third person vision, it is still understudied in egocentric settings and in particular in industrial scenarios. To encourage research in this field, we present MECCANO, a multimodal dataset of egocentric videos to study humans behavior understanding in industrial-like settings. The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset. The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view, such as recognizing and anticipating human-object interactions. With the MECCANO dataset, we explored five different tasks including 1) Action Recognition, 2) Active Objects Detection and Recognition, 3) Egocentric Human-Objects Interaction Detection, 4) Action Anticipation and 5) Next-Active Objects Detection. We propose a benchmark aimed to study human behavior in the considered industrial-like scenario which demonstrates that the investigated tasks and the considered scenario are challenging for state-of-the-art algorithms. To support research in this field, we publicy release the dataset at https://iplab.dmi.unict.it/MECCANO/.

show abstract

“…Previous works have investigated different forms of anticipation tasks, including next-active object predictions [5,11,19,21,32], predicting future actions [10, 12-14, 30, 35,37], forecasting human-object interactions [26], predicting future hands [6,20] or user trajectory prediction [31].…”

Section: Introductionmentioning

confidence: 99%

ENIGMA: Egocentric Navigator for Industrial Guidance, Monitoring and Anticipation

Ragusa

Furnari

Lopes³

et al. 2023

Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Application

View full text Add to dashboard Cite

Anticipation problem has been studied considering different aspects such as predicting humans' locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this paper, we studied the short-term object interaction anticipation problem from the egocentric point of view, proposing a new end-toend architecture named StillFast. Our approach simultaneously processes a still image and a video detecting and localizing next-active objects, predicting the verb which describes the future interaction and determining when the interaction will start. Experiments on the large-scale egocentric dataset EGO4D [19] show that our method outperformed state-of-the-art approaches on the considered task. Our method is ranked first in the public leaderboard of the EGO4D short term object interaction anticipation challenge 2022. Please see the project web page for code and additional details: https://iplab.dmi.unict. it/stillfast/.

show abstract

Predicting short-term next-active-object through visual attention and hand position

Cited by 14 publications

References 21 publications

Anticipating Next Active Objects for Egocentric Videos

Anticipating Next Active Objects for Egocentric Videos

MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain

ENIGMA: Egocentric Navigator for Industrial Guidance, Monitoring and Anticipation

Contact Info

Product

Resources

About