The appropriate selection of distinctive keyframes to represent the salient contents of a video is a critical task in video processing applications that rely on content analysis or information retrieval. Although many of the existing keyframe selection techniques perform satisfactorily in capturing salient visual contents, they often fail to adequately highlight the changes in visual information brought about by motion of objects between frames. In this paper, we propose a technique for keyframe selection by formulating the dissimilarity between the frames of a video shot in terms of the change in orientations that the corresponding objects of the two frames have undergone due to motion. This is accomplished by steerable filtering of the frames in order to extract the information about the local orientation of pixels within each frame. The frame to frame dissimilarity is adaptively thresholded over a group of frames in order to select the keyframes. In essence, keyframes are selected at the temporal instances where the change in orientation attains local maxima. Our keyframe selection methodology is specifically relevant to video colourization due to the fact that the keyframes that are to be employed for colourization must be chosen such that they capture all orientational changes effectively, while ensuring adequate content coverage.
As imaging is a process of 2D projection of a 3D scene, the depth information is lost at the time of image capture from conventional camera. This depth information can be inferred back from a set of visual cues present in the image. In this work, we present a model that combines two monocular depth cues namely Texture and Defocus. Depth is related to the spatial extent of the defocus blur by assuming that more an object is blurred, the farther it is from the camera. At first, we estimate the amount of defocus blur present at edge pixels of an image. This is referred as the Sparse Defocus map. Using the sparse defocus map we generate the full defocus map. However such defocus maps always contain hole regions and ambiguity in depth. To handle this problem an additional depth cue, in our case texture has been integrated to generate better defocus map. This integration mainly focuses on modifying the erroneous regions in defocus map by using the texture energy present at that region. The sparse defocus map is corrected using texture based rules. The hole regions, where there are no significant edges and texture are detected and corrected in sparse defocus map. We have used region wise propagation for better defocus map generation. The accuracy of full defocus map is increased with the region wise propagation.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.