2020
DOI: 10.1109/tmm.2019.2953814
|View full text |Cite
|
Sign up to set email alerts
|

Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(12 citation statements)
references
References 49 publications
0
12
0
Order By: Relevance
“…In this sense, a new line of approaches is also emerging, namely the use of transformers (Girdhar et al, 2019;Liu et al, 2019) and attention mechanisms (Ke et al, 2019;Qiao et al, 2020;Wu et al, 2020). Commonly for fine-grained action recognition, the frame or sequence of frames incorporates irrelevant or redundant information, with no discriminatory property.…”
Section: Vision-based DL Methods For Har/hapmentioning
confidence: 99%
See 1 more Smart Citation
“…In this sense, a new line of approaches is also emerging, namely the use of transformers (Girdhar et al, 2019;Liu et al, 2019) and attention mechanisms (Ke et al, 2019;Qiao et al, 2020;Wu et al, 2020). Commonly for fine-grained action recognition, the frame or sequence of frames incorporates irrelevant or redundant information, with no discriminatory property.…”
Section: Vision-based DL Methods For Har/hapmentioning
confidence: 99%
“…So, these algorithms guide the model to use attentional regions, instead of the whole frame, enhance local features and attain selectively feature fusion. For example, Wu et al (2020) implemented channel-wise and spatial attention mechanisms, along with baseline CNNs (VGG16 and ResNet-50) and LSTM. Additionally, when comparing to LSTM, transformers can be a lighter and maybe more suitable alternative for online performances (Kozlov et al, 2020).…”
Section: Vision-based DL Methods For Har/hapmentioning
confidence: 99%
“…Liu et al [65] explored residual and squeeze-and-excitation structures for feature extraction and proposed context beam search to integrate the Transformerbased [63] language model into CTC-based methods. The attention mechanism, which has been widely adopted in scene text recognition [35], [30], [67], action recognition [37], [38], and video processing [39], can also be applied to offline HCTR. Xiu et al [40] improved the attention-based decoder by a multi-level multi-modal fusion network.…”
Section: A Offline Handwritten Chinese Text Recognitionmentioning
confidence: 99%
“…Recent years have witnessed rapid growth of deep learning models, especially deep convolutional neural networks (CNN). With the successful applications of CNN in other low-level vision tasks like super resolution [41] and image denoising [50], CNN have also been widely used in SIRR problem [10], [13]- [15], [26], [34], [35], [42], [46], [49]. People usually deploy functional blocks from network structures such as residual nets [17], [35], dense nets [10], [20] and squeezeand-excitation nets [19], [26] to enhance their network.…”
Section: Introductionmentioning
confidence: 99%