PMHI: Proposals From Motion History Images for Temporal Segmentation of Long Uncut Videos

Murtaza, Fiza; Yousaf, Muhammad Haroon; Velastín, Sergio A.

doi:10.1109/lsp.2017.2778190

Cited by 14 publications

(11 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Schindler and Gool (2008) claimed that from one to seven frames are sufficient to recognise a basic action from a very short video shot. Unlike above, (Murtaza et al, 2018) claimed that the video content could adaptively determine the suitable number of frames, and in a typical video, requires between ten to twenty five frames. We chose ten for window size W similar to Paul et al (2017) algorithm because it reflects the texture and motion features within video shots effectively.…”

Section: Experiments and Results Analysismentioning

confidence: 99%

“…The key frames (KFs) extraction algorithm should provide a compact video summarization with less processing time, and preserve the sufficient information in the video with simple implementation (Cao et al, 2012;Murtaza et al, 2018;Paul et al, 2017). From literature reviews, the key frame extraction algorithm starts by detecting the shot change to segment a video to several shots, then extracts the key frames from each shot (Truong & Venkatesh, 2007).…”

Section: Introductionmentioning

confidence: 99%

“…A complex video is a video which has shots with a variety of challenging situations, such as multiple moving objects and moving camera which lead to variations in the background (Schuldt et al, 2004), or long uncut video containing noise (Murtaza et al, 2018). Several environments and camera conditions were experimented using key frames extraction algorithms.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An Improved Action Key Frames Extraction Algorithm for Complex Colour Video Shot Summarization

Mizher

Choo

Abdullah

et al. 2019

JICT

View full text Add to dashboard Cite

Key frame extraction is one of the critical techniques in computer vision fields such as video search, video identification and video forgeries detection. The extracted key frames should be sufficient key frames that preserve main actions in a video with compact representation. The objective of this work is to improve our previous action key frames extraction algorithm (AKF) by adapting a threshold which selects action key frames as final key frames. The threshold adaptation was achieved by using the mean value, the standard deviation, and the L1norm instead of the comparison of user summaries evaluation method to obtain a fully automatic video summarisation algorithm, and by eliminating the conditions in selecting the final key frames to reduce the complexity of the algorithm. We have validated our proposed Improved AKF on complex colour video shots instead of the simple grey level video shots.144 frames, reduce processing complexity, and preserve sufficient information about the main actions in a video shot. We then evaluated the Improved AKF algorithm with the-state-of-theart algorithms in terms of compression ratio using Paul videos and Shih-Tang dataset. The evaluation results showed that the Improved AKF algorithm achieved better compression ratio and retained sufficient information in the extracted action key frames under different testing video shots. Therefore, the improved AKF algorithm is a suitable technique for applications in computer vision fields such as passive object-based video authentication systems.

show abstract

Section: Experiments and Results Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Improved Action Key Frames Extraction Algorithm for Complex Colour Video Shot Summarization

Mizher

Choo

Abdullah

et al. 2019

JICT

View full text Add to dashboard Cite

show abstract

“…Numerous feature extraction methods have been proposed for HAR using RGB video data, which achieved successful recognition results. Particularly, these methods include 3D gradientbased spatiotemporal descriptor [15], spatiotemporal interest point (STIP) detector [16], motion-energy images (MEIs) and motion history images (MHIs) [17], [18]. The evolution of deep learning schemes, i.e., deep learning based convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) networks, has motivated the researchers to explore its application for action recognition from RGB videos [19]- [22].…”

Section: Introductionmentioning

confidence: 99%

Robust Human Activity Recognition Using Multimodal Feature-Level Fusion

et al. 2019

View full text Add to dashboard Cite

Automated recognition of human activities or actions has great significance as it incorporates wide-ranging applications, including surveillance, robotics, and personal health monitoring. Over the past few years, many computer vision-based methods have been developed for recognizing human actions from RGB and depth camera videos. These methods include space-time trajectory, motion encoding, key poses extraction, space-time occupancy patterns, depth motion maps, and skeleton joints. However, these camera-based approaches are affected by background clutter and illumination changes and applicable to a limited field of view only. Wearable inertial sensors provide a viable solution to these challenges but are subject to several limitations such as location and orientation sensitivity. Due to the complementary trait of the data obtained from the camera and inertial sensors, the utilization of multiple sensing modalities for accurate recognition of human actions is gradually increasing. This paper presents a viable multimodal feature-level fusion approach for robust human action recognition, which utilizes data from multiple sensors, including RGB camera, depth sensor, and wearable inertial sensors. We extracted the computationally efficient features from the data obtained from RGB-D video camera and inertial body sensors. These features include densely extracted histogram of oriented gradient (HOG) features from RGB/depth videos and statistical signal attributes from wearable sensors data. The proposed human action recognition (HAR) framework is tested on a publicly available multimodal human action dataset UTD-MHAD consisting of 27 different human actions. K-nearest neighbor and support vector machine classifiers are used for training and testing the proposed fusion model for HAR. The experimental results indicate that the proposed scheme achieves better recognition results as compared to the state of the art. The feature-level fusion of RGB and inertial sensors provides the overall best performance for the proposed system, with an accuracy rate of 97.6%.

show abstract

“…Over the last few years, convolutional neural networks (CNNs) have led to improved accuracy of action recognition [5][6][7][8]. However, TAD methods [2,[10][11][12][13][14] still need improvement. In [10], Pyramid of Score Distribution Feature (PSDF) based TAD approach is proposed.…”

Section: Introductionmentioning

confidence: 99%