2009 IEEE International Conference on Acoustics, Speech and Signal Processing 2009
DOI: 10.1109/icassp.2009.4960384
|View full text |Cite
|
Sign up to set email alerts
|

Audio-assisted trajectory estimation in non-overlapping multi-camera networks

Abstract: We present an algorithm to improve trajectory estimation in networks of non-overlapping cameras using audio measurements. The algorithm fuses audiovisual cues in each camera's field of view and recovers trajectories in unobserved regions using microphones only. Audio source localization is performed using Stereo Audio and Cycloptic Vision (STAC) sensor by estimating the time difference of arrival (TDOA) between microphone pair and then by computing the cross correlation. Audio estimates are then smoothed using… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 9 publications
0
14
0
Order By: Relevance
“…Figures 2-3 show snapshots of the tracking results in two of the sequences. 4 For the first sequence shown in in Fig. 2, we have shown the particle blobs as well as the final estimates.…”
Section: Simulation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Figures 2-3 show snapshots of the tracking results in two of the sequences. 4 For the first sequence shown in in Fig. 2, we have shown the particle blobs as well as the final estimates.…”
Section: Simulation Resultsmentioning
confidence: 99%
“…However, the availability itself needs to be determined and can be erroneous in presence of clutter measurements or measurements corresponding to other speaking and visible targets. The most common solution is to linearly combine the measurement likelihoods of the visual and audio observations where the weights of the combination are adjusted dynamically according to an acoustic confidence measure [3] or using separate confidence measures for the audio and video channels [4]. However, linear combination of the two likelihoods is mainly heuristic and not mathematically accurate.…”
Section: Introductionmentioning
confidence: 99%
“…But the number of synchronized sensor is usually limited by the maximal output current of one camera. In [10][11][12], authors employed a multi-modal detection and tracking algorithm. They localized the audio source by estimating the time difference of arrival and improved trajectory estimation in networks of nonoverlapping cameras using audio measurements.…”
Section: Multimodal Fusion and Sensor Collaborationmentioning
confidence: 99%
“…In this case, as the microphone pair is unable to provide the localization information, we estimate the trajectory using a first-order motion model. Finally, we perform the audio-visual fusion within Kalman filtering with a weighted sum of the two measurements [10] that only penalizes audio detections in overlapping regions and gives a weight no smaller than 0.5 to the video modality, when available.…”
Section: Audio-visual Trajectory Estimationmentioning
confidence: 99%