Revisiting spatio-temporal layouts for compositional action recognition

Radevski, Gorjan; Moens, Marie‐Francine; Tuytelaars, Tinne

doi:10.48550/arxiv.2111.01936

Cited by 2 publications

(2 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Obj. Top-1 Top-5 STIN [26] 37.2 62.4 I3D+STIN [26] 48.2 72.6 CAF [30] 52.3 78.9 STRG [42] 52.3 78.3 IRN (Ours) 52.9 80.8 I3D-STIN [26] 51.5 77.1 STRG-STIN [26] 56.2 81.3 CACNF [30] 56.9 82.5 224 × 224. We refer to this augmentation approach as SCR, which we use in all our experiments on EPIC-KITCHENS-100.…”

Section: Datasets Evaluation and Implementation Detailsmentioning

confidence: 99%

Hand-Object Interaction Reasoning

Ma¹,

Damen²

2022

Preprint

View full text Add to dashboard Cite

This paper proposes an interaction reasoning network for modelling spatio-temporal relationships between hands and objects in video. The proposed interaction unit utilises a Transformer module to reason about each acting hand, and its spatio-temporal relation to the other hand as well as objects being interacted with. We show that modelling two-handed interactions are critical for action recognition in egocentric video, and demonstrate that by using positionally-encoded trajectories, the network can better recognise observed interactions. We evaluate our proposal on EPIC-KITCHENS and Something-Else datasets, with an ablation study.

show abstract

Section: Datasets Evaluation and Implementation Detailsmentioning

confidence: 99%

Hand-Object Interaction Reasoning

Ma¹,

Damen²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Such information can be readily and reliably extracted by modern deep learning algorithms and have been reported to enhance the accuracy of action recognition [6]. For example, [7] exploited the positional relations between instances and object categories, and achieved accurate scene-level object-centered action recognition. Recently, [8] proposed a new bi-modal network for action detection, which has an RGB stream and a pose stream, and demonstrated that the heterogeneous features provide essential information for accurate action detection.…”

Section: Introductionmentioning

confidence: 99%

Heterogeneous Feature Fusion for Improving Performance of Action Detection

Babazaki,

Iwamoto,

Takahashi

et al. 2024

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

We present a novel framework aimed at improving video action detection through the integration of heterogeneous features. Conventional action detection methods which focus on modeling the relationships between person/object instances rely exclusively on video features and do not exploit valuable intra-instance heterogeneous features, such as person pose, positional information or object category, that can support action recognition. Our proposed framework, termed Heterogeneous Feature Fusion (HFF) framework, addresses this limitation by integrating such intra-instance heterogeneous features for person/object instances, and can improve existing action detection methods. To efficiently exploit each heterogeneous feature, which vary in importance depending on actions and/or scenes, we introduce an attention mechanism to dynamically enhance important heterogeneous features within an instance. Experiments on JHMDB and AVA v2.2 datasets show that our HFF significantly enhances the action detection performance of two existing methods.

show abstract

Revisiting spatio-temporal layouts for compositional action recognition

Cited by 2 publications

References 47 publications

Hand-Object Interaction Reasoning

Hand-Object Interaction Reasoning

Heterogeneous Feature Fusion for Improving Performance of Action Detection

Contact Info

Product

Resources

About