“…Many methods have been proposed for computing supervoxels, including energy minimization by graph cut [38], non-parametric feature-space analysis [28], graphbased merging [9], [13], [42], contour-evolving optimization [17], [21], [31], optimization of normalized cuts [33], [7], generative probabilistic framework [5] and hybrid clustering [30], [43], etc. These methods can be classified according • R. Yi to different representation formats: (1) temporal superpixels [5], [4], [17], [21], [30], [31], [39]: supervoxels are represented in each frame and their labels are temporally consistent in adjacent frames, and (2) supervoxels [7], [9], [13], [28], [33], [38], [42], [43]: they are 3D primitive volumes whose union forms the video volume. Note that these two representations can be transferred to each other.…”