2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00300
|View full text |Cite
|
Sign up to set email alerts
|

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 36 publications
(15 citation statements)
references
References 30 publications
0
15
0
Order By: Relevance
“…Clustered attention is used to capture contextual spatial-temporal information, and transformer encoder-based techniques with different backbone networks extract features for learning actor interactions from multimodal inputs [12]. Additionally, MAC-Loss [38], a combination of spatial and temporal transformers in two complimentary orders, has been proposed to enhance the learning effectiveness of actor interactions and preserve actor consistency at the frame and video levels. Tamura et al [39] introduces a framework without using heuristic features for recognizing social group activities and identifying group members.…”
Section: Group Activity Recognition (Gar)mentioning
confidence: 99%
“…Clustered attention is used to capture contextual spatial-temporal information, and transformer encoder-based techniques with different backbone networks extract features for learning actor interactions from multimodal inputs [12]. Additionally, MAC-Loss [38], a combination of spatial and temporal transformers in two complimentary orders, has been proposed to enhance the learning effectiveness of actor interactions and preserve actor consistency at the frame and video levels. Tamura et al [39] introduces a framework without using heuristic features for recognizing social group activities and identifying group members.…”
Section: Group Activity Recognition (Gar)mentioning
confidence: 99%
“…Transformer-based encoders, often coupled with diverse backbone networks, excel in extracting features for discerning actor interactions in multimodal data [ 46 ]. Recent innovations, such as MAC-Loss, introduce dual spatial and temporal transformers for enhanced actor interaction learning [ 47 ]. The field continues to evolve with heuristic-free approaches like those by Tamura et al, simplifying the process of social group activity recognition and member identification [ 48 ].…”
Section: Related Workmentioning
confidence: 99%
“…Machine learning-based, especially deep learning, methods are capable of learning features at various levels of abstraction from the training data to obtain better performance than those using hand-crafted features. Among the recent deep learning methods, multi-head self-attention networks (MHSA)-based methods [8][9][10] achieved the best performance with a global receptive field, although not being computationally efficient. Graphs have shown great success in characterizing the structure of a group and the interactions existing in a group in recent years.…”
Section: Introductionmentioning
confidence: 99%