Multimodal Signal Processing 2012
DOI: 10.1017/cbo9781139136310.006
|View full text |Cite
|
Sign up to set email alerts
|

Sampling techniques for audio-visual tracking and head pose estimation

Abstract: Analyzing people behaviors in smart environment using multimodal sensors requires to answer a set of typical questions: who are the people, where are they, what activities are they doing, when, with whom are they interacting, and how. In this view, locating people or their faces and characterizing them (e.g. extracting their body or head orientation) allows to address the first two questions (who and where), and is usually one of the first steps before applying higher level multimodal scene analysis algorithms… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 33 publications
(38 reference statements)
0
2
0
Order By: Relevance
“…Therefore, the current technologies must be improved to a level that is robust and usable. The most significant challenge facing visual tracking systems in an indoor environment (e.g., meeting scenarios) is still the reliable detection of persons when their appearance changes from different camera views, with partial occlusions occurring in natural, uncontrolled environments [ 19 , 21 ]. According to the evaluation results presented by the CLEAR 2006 workshop for visual tracking tasks, a tracker which used a single input video stream from a top-view camera was the best-performing tracker compared with other approaches based on the fusion of multiple camera streams [ 31 ].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, the current technologies must be improved to a level that is robust and usable. The most significant challenge facing visual tracking systems in an indoor environment (e.g., meeting scenarios) is still the reliable detection of persons when their appearance changes from different camera views, with partial occlusions occurring in natural, uncontrolled environments [ 19 , 21 ]. According to the evaluation results presented by the CLEAR 2006 workshop for visual tracking tasks, a tracker which used a single input video stream from a top-view camera was the best-performing tracker compared with other approaches based on the fusion of multiple camera streams [ 31 ].…”
Section: Related Workmentioning
confidence: 99%
“…An essential building block for scene analysis is the detection and tracking of objects in the scene. The biggest challenge facing visual tracking systems in indoor environments (e.g., meeting scenarios) is still the reliable detection of persons when their appearance changes from different camera views, with partial occlusions occurring in natural uncontrolled environments [ 19 , 20 , 21 ].…”
Section: Introductionmentioning
confidence: 99%