In this paper, we present a perception principlesguided video segmentation method, where statistical modeling and graph-theoretic approaches are combined in a multi-layer classification architecture. Various visual cues are effectively incorporated in a sequential segmentation process. Specifically, low-level pixel-wise features are used in the first layer where a joint spatio-temproal statistical modeling approach is used to construct entry-level visual units in space-time. In the second layer, all units are first classified into dynamic or static units based their motion magnitudes. Then dynamic units are further parsed into over-segmented moving regions that are connected in space and time, and a mid-level feature, motion trajectory, is extracted for each moving region. In the third layer, still and moving regions are merged into background and moving objects by a graph-based approach with different similarity metrics. The proposed algorithm employs both long-range motion information, i.e., trajectory, and short-range motion information, i.e., change detection, to retain temporal continuity and spatial homogeneity of moving objects. The proposed multi-layer structure ensembles the joint spatio-temproal and cascade process of perception principles and support efficient and accurate object segmentation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.