Forecasting Action Through Contact Representations From First Person Video

Dessalene, Eadom; Devaraj, Chinmaya; Maynord, Michael; Fermüller, Cornelia; Aloimonos, Yiannis

doi:10.1109/tpami.2021.3055233

Cited by 42 publications

(32 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Action Anticipation Different works have tackled this task [12,41,6,29]. Previous approaches have considered baselines designed for action recognition [4], defined custom losses [11], modeled the evolution of scene attributes and action over time [31], disentangled the tasks of encoding and anticipation [12], aggregated features over time [41], predicted motor attention [29], leveraged contact representations [6], mimicked intuitive and analytical thinking [53], and predicted future representations [51]. While these approaches have been designed to maximize performance when predicting the future, they have never been evaluated in a streaming scenario.…”

Section: Egocentricmentioning

confidence: 99%

“…Wearable devices equipped with egocentric cameras are recently attracting attention as an ideal platform to implement intelligent agents able to provide assistance to humans in a natural way [22]. Among the different problems studied in egocentric vision, the task of action anticipation, which consists in predicting a plausible future action before it is performed by the camera wearer, has attracted a lot of attention [2,4,6,12,29,34,41,51,53]. Indeed, from a practical point of view, being able to predict future events is fundamental when designing technologies which can assist humans in their daily and working activities [24,43].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Towards Streaming Egocentric Action Anticipation

Furnari¹,

Farinella²

2021

Preprint

View full text Add to dashboard Cite

show abstract

Section: Egocentricmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Towards Streaming Egocentric Action Anticipation

Furnari¹,

Farinella²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The main focus of these works is to extract relevant information from the observations to predict the label of the action starting in τ seconds, varying between zero [32] to 10s of seconds [33]. Other models leverage external cues such as hand movements to help with the anticipation task [34,35].…”

Section: Related Workmentioning

confidence: 99%

Weakly-Supervised Dense Action Anticipation

Zhang

Chen

Yao

2021

Preprint

View full text Add to dashboard Cite

Dense anticipation aims to forecast future actions and their durations for long horizons. Existing approaches rely on fully-labelled data, i.e. sequences labelled with all future actions and their durations. We present a (semi-) weakly supervised method using only a small number of fullylabelled sequences and predominantly sequences in which only the (one) upcoming action is labelled. To this end, we propose a framework that generates pseudo-labels for future actions and their durations and adaptively refines them through a refinement module. Given only the upcoming action label as input, these pseudo-labels guide action/duration prediction for the future. We further design an attention mechanism to predict context-aware durations. Experiments on the Breakfast and 50Salads benchmarks verify our method's effectiveness; we are competitive even when compared to fully supervised state-of-the-art models. We will make our code available at: https://github.com/zhanghaotong1/WSLVideoDenseAnticipation.

show abstract

“…Recognizing how hands interact with objects is crucial to understand how we interact with the world. Hand-object interaction analysis contributes to several fields such as action prediction [10], rehabilitation [28], robotics [38], and virtual reality [17].…”

Section: Introductionmentioning

confidence: 99%

Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction

Yagi¹,

Hasan²,

Sato³

2021

Preprint

View full text Add to dashboard Cite

Every hand-object interaction begins with contact. Despite predicting the contact state between hands and objects is useful in understanding hand-object interactions, prior methods on hand-object analysis have assumed that the interacting hands and objects are known, and were not studied in detail. In this study, we introduce a video-based method for predicting contact between a hand and an object. Specifically, given a video and a pair of hand and object tracks, we predict a binary contact state (contact or no-contact) for each frame. However, annotating a large number of hand-object tracks and contact labels is costly. To overcome the difficulty, we propose a semi-supervised framework consisting of (i) automatic collection of training data with motion-based pseudo-labels and (ii) guided progressive label correction (gPLC), which corrects noisy pseudo-labels with a small amount of trusted data. We validated our framework's effectiveness on a newly built benchmark dataset for hand-object contact prediction and showed superior performance against existing baseline methods. Code and data are available at https: //github.com/takumayagi/hand_object_contact_prediction.

show abstract

Forecasting Action Through Contact Representations From First Person Video

Cited by 42 publications

References 43 publications

Towards Streaming Egocentric Action Anticipation

Towards Streaming Egocentric Action Anticipation

Weakly-Supervised Dense Action Anticipation

Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction

Contact Info

Product

Resources

About