ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053394
|View full text |Cite
|
Sign up to set email alerts
|

Attentional Fused Temporal Transformation Network for Video Action Recognition

Abstract: Effective spatiotemporal feature representation is crucial to the video-based action recognition task. Focusing on discriminate spatiotemporal feature learning, we propose Information Fused Temporal Transformation Network (IF-TTN) for action recognition on top of popular Temporal Segment Network (TSN) framework. In the network, Information Fusion Module (IFM) is designed to fuse the appearance and motion features at multiple ConvNet levels for each video snippet, forming a short-term video descriptor. With fus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 36 publications
0
4
0
Order By: Relevance
“…More information, e.g., sound can be added via new streams [ 166 , 167 , 168 ]. The architecture was further investigated by trying different ways of fusing the layers and deeper networks [ 169 , 170 , 171 , 172 ]. To facilitate the high computational costs of 3D convolutional layers, Lin et al [ 173 ] introduced the Temporal Shift Module (TSM) that can be incorporated into 2D CNNs to model the exchanges among neighboring frames while maintaining the lower computational costs of 2D CNNs.…”
Section: Machine Learning Algorithms For Human Motion Analysismentioning
confidence: 99%
“…More information, e.g., sound can be added via new streams [ 166 , 167 , 168 ]. The architecture was further investigated by trying different ways of fusing the layers and deeper networks [ 169 , 170 , 171 , 172 ]. To facilitate the high computational costs of 3D convolutional layers, Lin et al [ 173 ] introduced the Temporal Shift Module (TSM) that can be incorporated into 2D CNNs to model the exchanges among neighboring frames while maintaining the lower computational costs of 2D CNNs.…”
Section: Machine Learning Algorithms For Human Motion Analysismentioning
confidence: 99%
“…In order to integrate spatial and temporal information comprehensively, we concatenate spatial featureF s , motion FeatureF t and the output F st of FCL. The fusion and concatenation process is depicted in (7).…”
Section: Lstm and Spatiotemporal Fusionmentioning
confidence: 99%
“…In this paper, to validate the performance of the proposed iCBAM-based method, we compare our proposed method to recent popular and related approaches, including IDT [25], two-stream [3], TSN [6], C3D [28], two-stream + IDT [4], IF-TTN [7], DTPP [5] and attention-based models [32]- [34]. Since the proposed iCBAM-based spatiotemporal-stream network is trained from scratch, we do not compare it with those are pre-trained on the large dataset.…”
Section: ) Experiments Analysis On Hmdb51mentioning
confidence: 99%
See 1 more Smart Citation