2017
DOI: 10.1186/s13640-017-0235-9
|View full text |Cite
|
Sign up to set email alerts
|

Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning

Abstract: Human activity recognition requires both visual and temporal cues, making it challenging to integrate these important modalities. The usual schemes for integration are averaging and fixing the weights of both features for all samples. However, how much weight is needed for each sample and modality, is still an open question. A mixture of experts via a gating Convolutional Neural Network (CNN) is one promising architecture for adaptively weighting every sample within a dataset. In this paper, rather than just a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 26 publications
(16 citation statements)
references
References 16 publications
0
15
0
1
Order By: Relevance
“…This is consistent with results in other NLP tasks. As for image encoders, VGGNet achieves higher scores than ResNet, which is often observed in multimodal tasks (Wang et al, 2017;Ouyang et al, 2017;Yudistira and Kurita, 2017). BERT × VGGNet using all the input modalities achieves the highest R 10 @1 score of 53.6%.…”
Section: Quantitative Resultsmentioning
confidence: 98%
“…This is consistent with results in other NLP tasks. As for image encoders, VGGNet achieves higher scores than ResNet, which is often observed in multimodal tasks (Wang et al, 2017;Ouyang et al, 2017;Yudistira and Kurita, 2017). BERT × VGGNet using all the input modalities achieves the highest R 10 @1 score of 53.6%.…”
Section: Quantitative Resultsmentioning
confidence: 98%
“…Deep learning based human action recognition solutions is proliferating with an added advantage one over another. Multistream deep architectures [8] [9] have surpassed the performances single stream deep state-of-the-arts [1] [16] due to the fact that such architectures are enriched with fusion of different types of action cues-temporal, motion, and spatial. The motion between frames is majorly defined as optical flow [16].…”
Section: Related Workmentioning
confidence: 99%
“…The mainstream literature listed above [5] [8] [9] [17] targeted action recognition from a common viewpoint. Such frameworks fail to produce a good performance for different viewpoint test samples.…”
Section: Related Workmentioning
confidence: 99%
“…Human behavior recognition is one of the growing research topics in computer vision and pattern recognition. Human behavior recognition is usually applied in machine learning to monitoring human activities and getting insight from them [1]. The behavioral examination can help solve many problems in indoor as well as outdoor surveillance systems.…”
Section: Introductionmentioning
confidence: 99%