ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054690
|View full text |Cite
|
Sign up to set email alerts
|

Audio-Visual Calibration with Polynomial Regression for 2-D Projection Using SVD-PHAT

Abstract: This paper proposes a straightforward 2-D method to spatially calibrate the visual field of a camera with the auditory field of an array microphone by generating and overlaying an acoustic image over an optical image. Using a low-cost microphone array and an off-the-shelf camera, we show that polynomial regression can deal efficiently with non-linear camera distortion, and that a recently proposed sound source localization method for real-time processing, SVD-PHAT, can be adapted for this task.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…Franc¸ois Grondin,[51] Proposed a simple 2-D approach by creating and overlaying the acoustic image with the visual field of a camera with the auditory field of an array microphone. Polynomial regression can effectively resolve non-linear video distortion using a low-cost microphone array and off-stage camera and that SVD-PHAT, a newly suggested approach for real-time analysis of sound sources can be tailored for this role.…”
Section: Iiia Review On (Linear Regression)mentioning
confidence: 99%
“…Franc¸ois Grondin,[51] Proposed a simple 2-D approach by creating and overlaying the acoustic image with the visual field of a camera with the auditory field of an array microphone. Polynomial regression can effectively resolve non-linear video distortion using a low-cost microphone array and off-stage camera and that SVD-PHAT, a newly suggested approach for real-time analysis of sound sources can be tailored for this role.…”
Section: Iiia Review On (Linear Regression)mentioning
confidence: 99%
“…In this scenario, there are two speech sources, the target and interference, and it is assumed that these sources have different DOAs. SteerNet assumes that the DOA of the target speech is available and is obtained using sound source localization methods [26,27,28], or using a visual cue when both optical and acoutic images are properly aligned [29].…”
Section: Steernetmentioning
confidence: 99%