Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475438
|View full text |Cite
|
Sign up to set email alerts
|

TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Abstract: In recent years, assessing action quality from videos has attracted growing attention in computer vision community and humancomputer interaction. Most existing approaches usually tackle this problem by directly migrating the model from action recognition tasks, which ignores the intrinsic differences within the feature map such as foreground and background information. To address this issue, we propose a Tube Self-Attention Network (TSA-Net) for action quality assessment (AQA). Specifically, we introduce a sin… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 37 publications
(18 citation statements)
references
References 41 publications
0
11
0
Order By: Relevance
“…Under 'w/o DD', out method achieves 0.9451(Sp. Corr) and 0.3222(R-ℓ 2 ), outperforms the CoRe and recent TSA-Net [27]. It's worth noting that TSA-Net utilizes an external VOT tracker [35] to extract human locations and then enhance backbone features, which is orthogonal to the main issue of temporal parsing addressed in our work.…”
Section: Comparison To State-of-the-artmentioning
confidence: 96%
See 1 more Smart Citation
“…Under 'w/o DD', out method achieves 0.9451(Sp. Corr) and 0.3222(R-ℓ 2 ), outperforms the CoRe and recent TSA-Net [27]. It's worth noting that TSA-Net utilizes an external VOT tracker [35] to extract human locations and then enhance backbone features, which is orthogonal to the main issue of temporal parsing addressed in our work.…”
Section: Comparison To State-of-the-artmentioning
confidence: 96%
“…On MTL-AQA dataset, we evaluated our experiments with two different settings, following prior work [32]. Specifically, the MTL-AQA dataset contains Pose+DCT [20] 0.2682 -C3D-SVR [19] 0.7716 -C3D-LSTM [19] 0.8489 -MSCADC-STL [18] 0.8472 -C3D-AVG-STL [18] 0.8960 -MSCADC-MTL [18] 0.8612 -C3D-AVG-MTL [18] 0.9044 -USDL [22] 0.9066 0.654 CoRe [32] 0.9341 0.365 TSA-Net [27] 0.9422 -Ours 0.9451 0.3222 Method (w/ DD) Sp. Corr R-ℓ2(× 100) USDL [22] 0.9231 0.468 MUSDL [22] 0.9273 0.451 CoRe [32] 0.9512 0.260 Ours 0.9607 0.2378 the label of difficult degree, and each video's quality score is calculated by the multiplication of the raw score with its difficulty.…”
Section: Comparison To State-of-the-artmentioning
confidence: 99%
“…Fall Recognition in Figure Skating (FR-FS) [11]. Existing AQA datasets of figure skating [5], [10] only contain long videos, which will lead to the inundation of detailed information, and finally affect the evaluation performance of the AQA model.…”
Section: Unlv Dive and Unlv Vaultmentioning
confidence: 99%
“…Existing AQA datasets of figure skating [5], [10] only contain long videos, which will lead to the inundation of detailed information, and finally affect the evaluation performance of the AQA model. To solve this problem, Wang et al [11] proposed FR-FS to recognize figure skating falls and planned to build a delicate granularity AQA system gradually. Videos in FR-FS contains the movements of the athlete's take-off, rotation, and landing.…”
Section: Unlv Dive and Unlv Vaultmentioning
confidence: 99%
See 1 more Smart Citation