2021
DOI: 10.1109/access.2021.3101175
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Actions in Videos and Action-Based Segmentation Using Deep Learning

Abstract: In this paper, we propose a technique to recognize multiple actions in a video using deep learning. The proposed approach is concerned with interpreting the overall context of a video and transforming it into one or more appropriate actions. In order to cope with multiple actions in a video, our proposed technique first determines the individual segments/shots in a video using intersections of color histograms. The segmented parts are then fed to the action recognition system comprising a combination of a Conv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 64 publications
0
5
0
Order By: Relevance
“…In [52], a pretrained ResNet is used to derive a feature representation for each frame and an LSTM to process the temporal information. In [3], a 2D-CNN and LSTM are used to process the spatiotemporal video information, and in addition, shot boundary detection is applied to segment and predict multiple actions occurring in a video. PivotCorrNN [53] introduces contextual gated recurrent units (cGRUs) to exploit time-varying information among different modalities (MFCC, IDT, etc.).…”
Section: ) Top-down Approachesmentioning
confidence: 99%
“…In [52], a pretrained ResNet is used to derive a feature representation for each frame and an LSTM to process the temporal information. In [3], a 2D-CNN and LSTM are used to process the spatiotemporal video information, and in addition, shot boundary detection is applied to segment and predict multiple actions occurring in a video. PivotCorrNN [53] introduces contextual gated recurrent units (cGRUs) to exploit time-varying information among different modalities (MFCC, IDT, etc.).…”
Section: ) Top-down Approachesmentioning
confidence: 99%
“…It could be one of the obstacles [18,19,20]. The significant challenging point in object detection is occlusion and it may be full/partial and can occur anytime an object passes behind another object [17,21].…”
Section: Some Challenges In Video Object Detectionmentioning
confidence: 99%
“…More importantly, the CNN can process large-scale high-definition resolution images. [7][8]. Therefore, the CNN reduces the amount of computation by means of sparse connections, as shown in Figure 1, which represents the difference between sparse connections and full connections.…”
Section: Basic Structure Of Cnnmentioning
confidence: 99%