2014
DOI: 10.1016/j.image.2014.03.004
|View full text |Cite
|
Sign up to set email alerts
|

Stereo object tracking with fusion of texture, color and disparity information

Abstract: A novel method for visual object tracking in stereo videos is proposed, which fuses an appearance based representation of the object based on Local Steering Kernel features and 2D color-disparity histogram information. The algorithm employs Kalman filtering for object position prediction and a sampling technique for selecting the candidate object regions of interest in the left and right channels. Disparity information is exploited, for matching corresponding regions in the left and right video frames. As trac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0
1

Year Published

2015
2015
2018
2018

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 18 publications
(17 citation statements)
references
References 56 publications
0
16
0
1
Order By: Relevance
“…The video stream is first segmented into shots. Then, face detection and tracking [28] [29] [30] is performed on the segmented video clips. Clustering is applied on the extracted facial images, in order to determine which images belong to the same character.…”
Section: Related Work a Video Summarizationmentioning
confidence: 99%
See 1 more Smart Citation
“…The video stream is first segmented into shots. Then, face detection and tracking [28] [29] [30] is performed on the segmented video clips. Clustering is applied on the extracted facial images, in order to determine which images belong to the same character.…”
Section: Related Work a Video Summarizationmentioning
confidence: 99%
“…As in the case of speakers, each face appearance consists simply of a video segment that starts and ends at the temporal boundaries of an uninterrupted face appearance. Such data may have been acquired through the successive application of face detection [52], face tracking [53], face clustering [54] and label propagation [55] algorithms.…”
Section: Multimodal Shot Pruning (Msp)mentioning
confidence: 99%
“…As in the case of speakers, each face appearance consists simply of a video segment that starts and ends at the temporal boundaries of an uninterrupted face appearance. Such data may have been acquired through the successive application of face detection [12], face tracking [13], face clustering [14] and label propagation [15] algorithms. Despite these algorithmic prerequisites, no extra data modalities (such as the movie script) are required, beyond the film itself.…”
Section: Multimodal Shot Pruning (Msp)mentioning
confidence: 99%
“…The intuition behind the modalities fusion is that one can perform a similar to speaker diarization analysis upon the visual data: face clustering. In more detail, assume that faces are detected in the frames of a movie and then the detected faces are tracked over time, resulting in a number of video facial trajectories [13], [2], [5]. A representative face is selected to represent a facial trajectory.…”
Section: Introductionmentioning
confidence: 99%