Abstract:Abstract. Mean shift is a nonparametric estimator of density which has been applied to image and video segmentation. Traditional mean shift based segmentation uses a radially symmetric kernel to estimate local density, which is not optimal in view of the often structured nature of image and more particularly video data. In this paper we present an anisotropic kernel mean shift in which the shape, scale, and orientation of the kernels adapt to the local structure of the image or video. We decompose the anisotro… Show more
“…Each region is given a unique color to evaluate longterm coherence and boundary consistency. Our segmentation exhibits consistent region identity and stable boundaries under conditions such as significant motion (waterskier rotating around own axis, first row) and dynamic [18,19]. BottomRight: Our tooned segmentation result is similar but features better region boundaries, indicated by evaluating the boundary of the girl over 10 frames (top middle).…”
Section: Resultsmentioning
confidence: 62%
“…They treat the video as a 3D space-time volume [12], and typically use a variant of the mean shift algorithm [5] for segmentation [6,18]. Dementhon [6] applied mean shift on a 3D lattice and used a hierarchical strategy to cluster the space-time video stack for computational efficiency.…”
Section: Related Workmentioning
confidence: 99%
“…Dementhon [6] applied mean shift on a 3D lattice and used a hierarchical strategy to cluster the space-time video stack for computational efficiency. Wang et al [19] used anisotropic kernel mean shift segmentation [18] for video tooning. Wang and Adelson [20] used motion heuristics to iteratively segment video frames into motion consistent layers.…”
We present an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by oversegmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a "region graph" over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. We also propose two novel approaches to improve the scalability of our technique: (a) a parallel outof-core algorithm that can process volumes much larger than an in-core algorithm, and (b) a clip-based processing algorithm that divides the video into overlapping clips in time, and segments them successively while enforcing consistency. We demonstrate hierarchical segmentations on video shots as long as 40 seconds, and even support a streaming mode for arbitrarily long videos, albeit without the ability to process them hierarchically.
“…Each region is given a unique color to evaluate longterm coherence and boundary consistency. Our segmentation exhibits consistent region identity and stable boundaries under conditions such as significant motion (waterskier rotating around own axis, first row) and dynamic [18,19]. BottomRight: Our tooned segmentation result is similar but features better region boundaries, indicated by evaluating the boundary of the girl over 10 frames (top middle).…”
Section: Resultsmentioning
confidence: 62%
“…They treat the video as a 3D space-time volume [12], and typically use a variant of the mean shift algorithm [5] for segmentation [6,18]. Dementhon [6] applied mean shift on a 3D lattice and used a hierarchical strategy to cluster the space-time video stack for computational efficiency.…”
Section: Related Workmentioning
confidence: 99%
“…Dementhon [6] applied mean shift on a 3D lattice and used a hierarchical strategy to cluster the space-time video stack for computational efficiency. Wang et al [19] used anisotropic kernel mean shift segmentation [18] for video tooning. Wang and Adelson [20] used motion heuristics to iteratively segment video frames into motion consistent layers.…”
We present an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by oversegmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a "region graph" over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. We also propose two novel approaches to improve the scalability of our technique: (a) a parallel outof-core algorithm that can process volumes much larger than an in-core algorithm, and (b) a clip-based processing algorithm that divides the video into overlapping clips in time, and segments them successively while enforcing consistency. We demonstrate hierarchical segmentations on video shots as long as 40 seconds, and even support a streaming mode for arbitrarily long videos, albeit without the ability to process them hierarchically.
“…The proposed method is efficiently exploits the color similarity of the target. The proposed method provides a general region merging framework, it does not depend initially mean shift segmentation method or other color image segmentation methods [20,24,25,36] can also be used for segmentation. Also we can use appending the different object part to obtaining complete object from complex scene, and also we can use some supervised technique also.…”
Abstract-In this work; we address a novel interactive framework for object retrieval using unsupervised similar region merging and flood fill method which models the spatial and appearance relations among image pixels. Efficient and effective image segmentation is usually very hard for natural and complex images. This paper presents a new technique for similar region merging and objects retrieval. The users only need to roughly indicate the after which steps desired objects boundary is obtained during merging of similar regions. A novel similarity based region merging mechanism is proposed to guide the merging process with the help of mean shift technique. A region R is merged with its adjacent regions Q if Q has highest similarity with R among all Q's adjacent regions. The proposed method automatically merges the regions that are initially segmented through mean shift technique, and then effectively extracts the object contour by merging all similar regions. Extensive experiments are performed on 22 object classes (524 images total) show promising results.
“…After it was introduced to the field of computer vision [6], mean shift has been adopted to solve various problems, such as image filtering, segmentation [3,13,15,[18][19] and object tracking [1, 2, 8-10, 12, 14, 16, 17].…”
Abstract:The background-weighted histogram (BWH) algorithm proposed in [2] attempts to reduce the interference of background in target localization in mean shift tracking. However, in this paper we prove that the weights assigned to pixels in the target candidate region by BWH are proportional to those without background information, i.e. BWH does not introduce any new information because the mean shift iteration formula is invariant to the scale transformation of weights. We then propose a corrected BWH (CBWH) formula by transforming only the target model but not the target candidate model. The CBWH scheme can effectively reduce background's interference in target localization. The experimental results show that CBWH can lead to faster convergence and more accurate localization than the usual target representation in mean shift tracking. Even if the target is not well initialized, the proposed algorithm can still robustly track the object, which is hard to achieve by the conventional target representation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.