2019
DOI: 10.1109/tnnls.2018.2830119
|View full text |Cite
|
Sign up to set email alerts
|

Enhanced Robot Speech Recognition Using Biomimetic Binaural Sound Source Localization

Abstract: Inspired by the behavior of humans talking in noisy environments, we propose an embodied embedded cognition approach to improve automatic speech recognition (ASR) systems for robots in challenging environments, such as with ego noise, using binaural sound source localization (SSL). The approach is verified by measuring the impact of SSL with a humanoid robot head on the performance of an ASR system. More specifically, a robot orients itself toward the angle where the signal-to-noise ratio (SNR) of speech is ma… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 37 publications
(23 citation statements)
references
References 47 publications
0
18
0
Order By: Relevance
“…The aims of acoustic models for the Cocktail Party problem are: identifying multiple speakers and disentangling each speech stream from noisy background. Numerous classical acoustic models are data-driven and based on algorithms of signal processing (Dávila-Chacón et al, 2018). Those models are robust and with good accuracy but lack the prior knowledge, biological plausibility and rely on the large datasets.…”
Section: Computational Models For the Human Cocktail Party Problem Somentioning
confidence: 99%
“…The aims of acoustic models for the Cocktail Party problem are: identifying multiple speakers and disentangling each speech stream from noisy background. Numerous classical acoustic models are data-driven and based on algorithms of signal processing (Dávila-Chacón et al, 2018). Those models are robust and with good accuracy but lack the prior knowledge, biological plausibility and rely on the large datasets.…”
Section: Computational Models For the Human Cocktail Party Problem Somentioning
confidence: 99%
“…At the same time, MAR models are widely used in forecasting. Within the same scope in [57], a neural network can be used to calculate the audio signal's angle. A forward-bound neural network is then used to deal with the noise.…”
Section: A Classification Of Articles Based On Domain Problemsmentioning
confidence: 99%
“…Furthermore, background noise can cause telephone channel distortions; suitable system performance in the presence of background noise requires high-quality microphone manufacturing [53]. In addition, to apply the approaches that use beamforming for speech segregation, the number of microphones has to be larger than the number of sound sources [57].…”
Section: B What Are the Major Challenges In Asr?(rq2)mentioning
confidence: 99%
See 1 more Smart Citation
“…The aims of acoustic models for the Cocktail Party problem are: identifying multiple speakers and disentangling each speech stream from noisy background. Numerous classical acoustic models are data-driven and based on algorithms of signal processing (Dá vila-Chacó n et al, 2018). Those models are robust and with good accuracy but lack the prior knowledge, biological plausibility and rely on the large datasets.…”
Section: Computational Modelsmentioning
confidence: 99%