2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2016
DOI: 10.1109/iros.2016.7759437
|View full text |Cite
|
Sign up to set email alerts
|

Reverberant sound localization with a robot head based on direct-path relative transfer function

Abstract: This paper addresses the problem of sound-source localization (SSL) with a robot head, which remains a challenge in real-world environments. In particular we are interested in locating speech sources, as they are of high interest for humanrobot interaction. The microphone-pair response corresponding to the direct-path sound propagation is a function of the source direction. In practice, this response is contaminated by noise and reverberations. The direct-path relative transfer function (DP-RTF) is defined as … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
3

Relationship

6
3

Authors

Journals

citations
Cited by 24 publications
(20 citation statements)
references
References 33 publications
0
20
0
Order By: Relevance
“…at the beginning of Speaker 3's trajectory (in blue). The possible reasons are i) the NAO robot (v5) has a relative strong egonoise [38], and thus the signal-to-noise ratio of the recorded signals is relative low, and ii) the speakers are moving with a varying source-to-robot distance and the direct-path speech is contaminated by more reverberations when the speakers are distant. Overall, DPRTF-REM and DPRTF-EG are able to monitor the moving, appearance, and disappearance of active speakers for most of the time, with a small time lag due to the temporal smoothing.…”
Section: B Results For Locata Datasetmentioning
confidence: 99%
“…at the beginning of Speaker 3's trajectory (in blue). The possible reasons are i) the NAO robot (v5) has a relative strong egonoise [38], and thus the signal-to-noise ratio of the recorded signals is relative low, and ii) the speakers are moving with a varying source-to-robot distance and the direct-path speech is contaminated by more reverberations when the speakers are distant. Overall, DPRTF-REM and DPRTF-EG are able to monitor the moving, appearance, and disappearance of active speakers for most of the time, with a small time lag due to the temporal smoothing.…”
Section: B Results For Locata Datasetmentioning
confidence: 99%
“…Hospedales et al [16] proposed a Bayesian model-based audio-visual fusion framework to segment, associate, and track multiple objects in audiovisual sequences. Li et al presented an SSL-based HRI system in [17]. They calibrated the sound sources' corresponding pixel coordinates.…”
Section: B Audio-visual Fusion Methodsmentioning
confidence: 99%
“…The audio-visual fusion works [17], [18], [19], all use static robots to track the observer. For the moving robot, in [20], Evers et al proposed an acoustic SLAM framework that is different from the general concept of SLAM.…”
Section: B Audio-visual Fusion Methodsmentioning
confidence: 99%
“…Experiments with real data are conducted using a version 5 NAO robot whose head has four microphones in a horizontal plane [22]. Thence we only perform 360 k is computed using the HRTF of NAO.…”
Section: Methodsmentioning
confidence: 99%