Figure 1: A unified approach to foreground/background video segmentation in unconstrained videos. Our algorithm can handle in a single framework video sequences which contain highly non-rigid foreground and background motions, complex 3D parallax as well as simple 2D motions and severe motion blur.We address the problem of Foreground/Background (fg/bg) segmentation of "unconstrained" video. By "unconstrained" we mean that the moving objects and the background scene may be highly non-rigid (e.g., waves in the sea); the camera may undergo a complex motion with 3D parallax; moving objects may suffer from motion blur, large scale and illumination changes, etc. Fig. 1 shows a few such examples. Most existing segmentation methods fail on such unconstrained videos, especially in the presence of highly non-rigid motion and low resolution. Unconstrained video has thus become the focus of most recent video segmentation methods [5,6,9,13].In this paper, we suggest a simple yet general algorithm for performing fg/bg video segmentation, which handles complex unconstrained videos. We cast the video segmentation problem as a voting scheme on the graph of similar ("re-occurring") regions in the video sequence. 'Reoccurring' regions can be quite far both in space and in time, but are constrained to be close in the appearance feature space. We start from crude saliency votes at each pixel, and iteratively correct those votes by "consensus voting" of re-occurring regions across the video sequence. The power of our consensus voting comes from the non-locality of the region reoccurrence, both in space and in time -enabling fast propagation of diverse and rich information across the entire video sequence. This enables the correction of large errors in the initial fg/bg votes.In contrast to trajectory-based methods [1,2,3,4,7,8,10, 11], we do not try to explicitly estimate long-term correspondences via flow estimation or tracking, but rather obtain long-term "probabilistic" correspondences using re-occurring regions across distant frames. This avoids the inherent uncertainties of explicit optical flow estimation, whose errors tend to accumulate over time. Similarly, MRF-based video segmentation methods [5,6,9,13] tend to propagate information only locally in spacetime. Their temporal links are based on optical-flow, whose rapidly accumulated errors induce weak (often zero) weights between related parts in faraway frames. The segmentation performance of video-MRF methods thus strongly depends on the quality of their initial fg/bg data term. However, fg/bg initializations tend to be very noisy, whether based on mining moving object proposals [5,6,13], or based on motion saliency maps [9] (especially in unconstrained low-quality videos). Therefore, current video segmentation methods encounter difficulties in such challenging videos. In contrast, our non-local consensus voting allows us to start with very 'noisy' fg/bg votes, and clean them rapidly according to 'consensus voting' of distant re-occurring regions.Qualitative and quantitati...