In the field of video segmentation, the majority methods are based on monocular video. Traditional unsupervised segmentation algorithms do not perform well in terms of time efficiency and accuracy, because of the bottleneck on the foreground definition. Semi-supervised segmentation algorithms aim to propagate the label information in one or more key frames, which are generated manually and used as masks in the processing, to the whole video. They can achieve high accuracy, while they are not suitable for the application scenario without human interaction. In this paper, we take advantage of binocular camera and propose an unsupervised algorithm to efficiently extract foreground part from stereo video. The depth information is embedded into a bilateral grid in the graph cut model which achieves considerable segmenting accuracy without human interaction. Streaming processing model is integrated to enable on-line processing for stereo video with arbitrary length. The precision, time efficiency, and adaptation to complex natural scenario of our algorithm are evaluated by experiments comparing with state-of-the-art algorithms in both unsupervised and semi-supervised approaches.