Abstract:For the problems of low accuracy and high complexity in detection of gradual shot boundary and long shot, a new video shot boundary detection algorithm based on feature fusion and clustering technique (FFCT) is proposed. In the algorithm, the interval frames of video sequence are selected, converted to gray images and scaled by sampling. With the frames, the speed-up robust features (SURF) and fingerprint features are extracted from non-compressed domain and compressed domain, and then the extracted features a… Show more
“…K-mean clustering-based classification technique was proposed in [61]. K-mean Clustering and linear discriminant analysis (LDA) using SURF and DCT fingerprint features were proposed in [40]. Some of the well-trained CNNs were utilized for low-level and high-level feature extraction in [62] and [63].…”
Section: B Scene Transition Pattern Feature-based Approachmentioning
confidence: 99%
“…The HSI-CNN [12] method achieved a 100% recall rate for both the "Zoom-in" and "Zoom-out" motion, whereas our method achieved lower accuracy. In Table 17, we compare with the feature fusion and K-means clusteringbased (FFCT) method [40] for gradual transition detection on video datasets with low and high dimensional frames. It is observed that our proposed SOCMR method achieved up to 4.45% and 7.29% higher F-1 rates for videos with lower (≤ 352 × 288) and higher (≥ 720 × 1280) frame dimensions, respectively.…”
Section: Camera Motion Object Motionmentioning
confidence: 99%
“…Comparison with FFCT[40] on gradual transition detection on low and high frame dimensionLow frame dimension (≤ 352 × 288)High frame dimension (≥ 720 × 1280)…”
Semantic video scene-understanding applications rely on object-camera motion recognition techniques for scene contextual movement representation. While existing machine learning-based methods perform efficiently, their primary limitation is to analyze motion patterns from normal frames only, neglecting the scene transition frames. This causes significant false alarms due to the undetected objectcamera motion patterns during scene transitions. In this paper, we propose a novel method for object and camera motion recognition of two consecutive scenes from their transition frames. First, our method detects cut transitions using principal component analysis (PCA) to segment the video into shots. Additionally, it eliminates large text transitions that are often falsely detected as cut transitions using structural similarity index measurement (SSIM) properties. Second, it selects candidate segments to localize normal and wipe transition frames using slope angle characteristics obtained from linear regression. Third, it extracts dense semantic spatial features at multi-scale using the modified DeepLabv3+ network to segment selected candidate frames into foreground, background, and wipe pixels. Finally, an optical flow algorithm-based temporal trajectory tracking model is applied on each segmented pixel to recognize the object, camera pan, zoom-in, and zoom-out motion patterns. We further remove falsely detected non-transition motion frames to improve wipe transition detection. The experimental results are obtained using the benchmark TRECVID and the multimedia datasets. The proposed method using pixel-level classification and temporal trajectory analysis achieved an average accuracy improvement of 9.28% for object-camera motion recognition, 3.75% for cut transition detection, and 3.01% for wipe transition detection.
“…K-mean clustering-based classification technique was proposed in [61]. K-mean Clustering and linear discriminant analysis (LDA) using SURF and DCT fingerprint features were proposed in [40]. Some of the well-trained CNNs were utilized for low-level and high-level feature extraction in [62] and [63].…”
Section: B Scene Transition Pattern Feature-based Approachmentioning
confidence: 99%
“…The HSI-CNN [12] method achieved a 100% recall rate for both the "Zoom-in" and "Zoom-out" motion, whereas our method achieved lower accuracy. In Table 17, we compare with the feature fusion and K-means clusteringbased (FFCT) method [40] for gradual transition detection on video datasets with low and high dimensional frames. It is observed that our proposed SOCMR method achieved up to 4.45% and 7.29% higher F-1 rates for videos with lower (≤ 352 × 288) and higher (≥ 720 × 1280) frame dimensions, respectively.…”
Section: Camera Motion Object Motionmentioning
confidence: 99%
“…Comparison with FFCT[40] on gradual transition detection on low and high frame dimensionLow frame dimension (≤ 352 × 288)High frame dimension (≥ 720 × 1280)…”
Semantic video scene-understanding applications rely on object-camera motion recognition techniques for scene contextual movement representation. While existing machine learning-based methods perform efficiently, their primary limitation is to analyze motion patterns from normal frames only, neglecting the scene transition frames. This causes significant false alarms due to the undetected objectcamera motion patterns during scene transitions. In this paper, we propose a novel method for object and camera motion recognition of two consecutive scenes from their transition frames. First, our method detects cut transitions using principal component analysis (PCA) to segment the video into shots. Additionally, it eliminates large text transitions that are often falsely detected as cut transitions using structural similarity index measurement (SSIM) properties. Second, it selects candidate segments to localize normal and wipe transition frames using slope angle characteristics obtained from linear regression. Third, it extracts dense semantic spatial features at multi-scale using the modified DeepLabv3+ network to segment selected candidate frames into foreground, background, and wipe pixels. Finally, an optical flow algorithm-based temporal trajectory tracking model is applied on each segmented pixel to recognize the object, camera pan, zoom-in, and zoom-out motion patterns. We further remove falsely detected non-transition motion frames to improve wipe transition detection. The experimental results are obtained using the benchmark TRECVID and the multimedia datasets. The proposed method using pixel-level classification and temporal trajectory analysis achieved an average accuracy improvement of 9.28% for object-camera motion recognition, 3.75% for cut transition detection, and 3.01% for wipe transition detection.
“…The characteristic merging and segmentation approach algorithm that is discussed in this paper by Feng-Feng Duan et al [11] takes into consideration both the global and indeed the local variables of the movie interval images. It doesn't just recover the characteristics of the condensed region, but additionally recovers the characteristics of the non-compressed zone, which enables it to perform a retrieval and merging of the characteristics that is exhaustive as well as precise.…”
Numerous techniques exist for detecting shot or scene boundaries, typically relying on visual characteristics of frames and contrasting said features among adjacent frames to identify shot boundaries. Shot Boundary Detection can utilize a range of visual characteristics such as edge feature, gray intensity, motion vector, and color histogram. The accuracy of Shot Boundary identification methods that currently exist has demonstrated a relatively high level of effectiveness. It is imperative to adapt current techniques to accommodate diverse video content types, as contemporary video productions employ a wider range of video creation methods than their predecessors. The presented approach in this methodology examines the different techniques fr the purpose of achieving the shot boundary detection. The Dual Tree – Discrete Wavelet Transformation approach has been contrasted with the deep learning approaches such as Artificial Neural Network and Convolutional Neural Network. The video is given as input on which frame extraction, feature extraction through Entropy Calculation and Mean Log Estimation is realized with the Deep learning methodologies. The outcomes of the shot boundary identification have been effectively compared with one another to determine performance which is displayed in the later sections of this research article.
“…Duan et al 16 introduced an SBD system based on feature fusion and clustering technology (FFCT). Sampling process have been selected, converted and scaled the interval frames of the video sequence to detect transitions.…”
Summary
This paper presents a hybrid feature extraction and optimized deep learning model for the effective shot boundary detection (SBD). A cross‐guided bilateral filtering technique is used for pre‐processing. A hybrid fuzzy histogram with dual tree complex wavelet transform (FH‐DTCWT) is developed for the feature extraction to extract the visual features from the pre‐processed frames of the block. Candidate segment selection process is performed to identify the non‐boundary frames from the extracted features to improve the accuracy. Continuity matrix is created using the possible transition frames to verify that the frames are in sequential order without any gap. Classification of transition types are performed by the proposed optimized deep convolutional neural network. Optimized deep learning (DCNN‐RBESO) model is the combination of deep convolutional neural network (DCNN) and Rider bald eagle search optimization (RBESO) algorithm. To update the weight of DCNN model in learning, RBESO is utilized. MATLAB tool is used to develop the proposed SBD model. The experimental analysis is carried out for the proposed model using TREC Video Retrieval Evaluation (TRECVID) and VideoSeg datasets. The proposed model outperformed in the detection of shot boundaries in terms of precision, recall, and F1‐score.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.