Abstract:In this paper, a motion vector-based video key frame detection algorithm is proposed to solve the problem of miss election and missing selection caused by the difficulty in detecting the moving target characteristics of the video key frame. Firstly, the entropy of adjacent frame difference and the twodimensional entropy of image are introduced, and the combination of the two is taken as the measurement of the difference between video frames. Secondly, outliers are detected by statistical tools to obtain the le… Show more
“…In this study, the keyframe extraction technique eliminates identical successive frames. Thus, keyframe extraction reduces the number of training frames and the computational cost of processing duplicate frames [40]. Algorithm 1 shows the pseudocode for keyframe extraction.…”
Violence recognition is crucial because of its applications in activities related to security and law enforcement. Existing semi-automated systems have issues such as tedious manual surveillances, which causes human errors and makes these systems less effective. Several approaches have been proposed using trajectory-based, non-object-centric, and deep-learning-based methods. Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods. However, the their performance must be improved. This study explores the state-of-the-art deep learning architecture of convolutional neural networks (CNNs) and inception V4 to detect and recognize violence using video data. In the proposed framework, the keyframe extraction technique eliminates duplicate consecutive frames. This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames. For feature selection and classification tasks, the applied sequential CNN uses one kernel size, whereas the inception v4 CNN uses multiple kernels for different layers of the architecture. For empirical analysis, four widely used standard datasets are used with diverse activities. The results confirm that the proposed approach attains 98% accuracy, reduces the computational cost, and outperforms the existing techniques of violence detection and recognition.
“…In this study, the keyframe extraction technique eliminates identical successive frames. Thus, keyframe extraction reduces the number of training frames and the computational cost of processing duplicate frames [40]. Algorithm 1 shows the pseudocode for keyframe extraction.…”
Violence recognition is crucial because of its applications in activities related to security and law enforcement. Existing semi-automated systems have issues such as tedious manual surveillances, which causes human errors and makes these systems less effective. Several approaches have been proposed using trajectory-based, non-object-centric, and deep-learning-based methods. Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods. However, the their performance must be improved. This study explores the state-of-the-art deep learning architecture of convolutional neural networks (CNNs) and inception V4 to detect and recognize violence using video data. In the proposed framework, the keyframe extraction technique eliminates duplicate consecutive frames. This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames. For feature selection and classification tasks, the applied sequential CNN uses one kernel size, whereas the inception v4 CNN uses multiple kernels for different layers of the architecture. For empirical analysis, four widely used standard datasets are used with diverse activities. The results confirm that the proposed approach attains 98% accuracy, reduces the computational cost, and outperforms the existing techniques of violence detection and recognition.
“…Shot detection is an important step for video incident detection in the video-represented methods. The typical methods are absolute inter-frame difference [34], color histogram [35], frame pixel difference [36], frame correlation coefficient [37], compressed domain difference [38], edge tracking [39], motion vector [40] and some deep learning methods, such as 3-D ConvNet [41], Two-Stream CNN [42] and CNN-LSTM [43]. We compared the above method with our method in terms of the dimension of raw data, the number of parameters representing the incident, the time complexity of the feature extraction algorithm, and the memory required by the algorithm.…”
Section: ) Performance Analysismentioning
confidence: 99%
“…The detailed experimental results are shown in Table 10. [40] 3 As shown in Table 10, each frame of the video is an image, plus the time dimension, the video sequence is actually 3-D sequence data. Assume that the width and height of each video frame is H and W. In practical applications, the video resolution H*W is often much greater than 320*480.…”
Retrieving incidents from video stream plays an important role in many computer vision applications. However, most video surveillance system can neither recognize incidents nor support contentbased retrieval before the video stream is saved into files. As an emerging type of sensing modality, Wi-Fi signal have the potential to become a signal synchronized with the video stream to perform the incidents detection and recognition. In this work, we simultaneously collect the video stream and the Wi-Fi signal in two surveillance scenarios, and develop a LSTM-based classification model that is able to recognize the incidents in surveillance scenarios. Specifically, we first deploy a video surveillance system in two scenarios to capture the video stream and the synchronized Wi-Fi signal that is very sensitive to environmental changes. Second, an incident detection method based on the entropy change of Wi-Fi signal is proposed to find out the start and end time of the incident in the CSI sequence, thus greatly reducing the computational complexity compared with shot detection in the video stream. Third, the deep network LSTM is adopted to develop an incident recognition model that would be used to classify each size-variable CSI segments into known categories corresponding to the types of incidents. Fourth, using Wi-Fi signal to locate and recognize incidents in the video stream, we build a quick content-based video retrieval system. Last, the experimental evaluation was performed on a group of real Wi-Fi signal samples. The statistical results shows that the proposed incident detection method is feasible and effective to find out the incidents in video files with an average error of 1.5 s. And the evaluation experiment results demonstrates that the proposed multi-classification model acquires an average value of 0.972, 0.973, 0.985, 0.972 and 0.962 for recall, precision, accuracy, F-1 score and Kappa coefficient, respectively.
“…ey enter their creations into competitions or share them with other VR creators on a small scale, but even the winning entries are almost unknown to the general public. As content producers, they want their work to be experienced by the public, and they are also eager to put into the market to get feedback as a guide for subsequent creation [3]. Panoramic video, due to its own characteristics, covers 360 degree * 180 degree view information, supports users to change the view direction for experience, and includes video, audio, subtitles, interactive, and other types of data.…”
Aiming at the problem of virtual reality and data processing algorithm of online video packaging, one transmission scheme uses TILES in HEVC to block the video and then applies MP4Box to pack the video and generate a DASH video stream. A method is proposed to process the same panoramic video with different quality. By designing a new index to measure the complexity of the coding tree unit, this method predicts the depth of the coding tree unit by using the complexity index and spatial correlation of the video, skipping unnecessary traversal range, and realizing fast division of coding units. Experimental results show that compared with the latest HM16.20 reference model, the proposed algorithm can reduce the coding time by 37.25%, the BD-rate only increases by 0.74%, and the video image quality is almost not lost.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.