2020
DOI: 10.1007/978-3-030-58545-7_11
|View full text |Cite
|
Sign up to set email alerts
|

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

Abstract: The state-of-the art solutions for human activity understanding from a video stream formulate the task as a spatio-temporal problem which requires joint localization of all individuals in the scene and classification of their actions or group activity over time. Who is interacting with whom, e.g. not everyone in a queue is interacting with each other, is often not predicted. There are scenarios where people are best to be split into sub-groups, which we call social groups, and each social group may be engaged … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 55 publications
(29 citation statements)
references
References 62 publications
0
14
0
Order By: Relevance
“…Inspired by a transformer network [80] which relies on self-attention mechanisms to allow the network to adaptively extract the most relevant information and relationships, Gavrilyuk et al [21] proposed an actor-transformers network which learns interactions between the actors and adaptively extracts the important information for activity recognition.…”
Section: Deep Relationship Modelingmentioning
confidence: 99%
“…Inspired by a transformer network [80] which relies on self-attention mechanisms to allow the network to adaptively extract the most relevant information and relationships, Gavrilyuk et al [21] proposed an actor-transformers network which learns interactions between the actors and adaptively extracts the important information for activity recognition.…”
Section: Deep Relationship Modelingmentioning
confidence: 99%
“…Several works tackle this problem from a graphbased perspective [40,63,100,101] such as applying Graph Convolutional Networks (GCNs) [49,96]. More recent works utilize attention modeling [63,73,98,103] including using Transformers [26,57] with a focus on determining the most critical persons [26,72,96,103], groups [24,57], or interactions [101]. Existing works have primarily use RGB-and optical-flow-based features with RoIAlign [33] to represent individuals [6,73,96,100].…”
Section: Group Activity Recognitionmentioning
confidence: 99%
“…COMPOSER facilitates compositional learning and a high-level semantic understanding of the video with a Multiscale Transformer that performs relational reasoning over these tokens scale by scale. and crowd behavior analysis [24,26,72,104]. Compared to the single-person atomic action recognition task, GAR requires addressing two additional challenges.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations