Enhanced Robot Speech Recognition Using Biomimetic Binaural Sound Source Localization

Dávila-Chacón, Jorge; Liu, Jindong; Wermter, Stefan

doi:10.1109/tnnls.2018.2830119

Cited by 37 publications

(23 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The aims of acoustic models for the Cocktail Party problem are: identifying multiple speakers and disentangling each speech stream from noisy background. Numerous classical acoustic models are data-driven and based on algorithms of signal processing (Dávila-Chacón et al, 2018). Those models are robust and with good accuracy but lack the prior knowledge, biological plausibility and rely on the large datasets.…”

Section: Computational Models For the Human Cocktail Party Problem Somentioning

confidence: 99%

What Can Computational Models Learn From Human Selective Attention? A Review From an Audiovisual Unimodal and Crossmodal Perspective

Weber

Yang

et al. 2020

Front. Integr. Neurosci.

Self Cite

View full text Add to dashboard Cite

Selective attention plays an essential role in information acquisition and utilization from the environment. In the past 50 years, research on selective attention has been a central topic in cognitive science. Compared with unimodal studies, crossmodal studies are more complex but necessary to solve real-world challenges in both human experiments and computational modeling. Although an increasing number of findings on crossmodal selective attention have shed light on humans' behavioral patterns and neural underpinnings, a much better understanding is still necessary to yield the same benefit for intelligent computational agents. This article reviews studies of selective attention in unimodal visual and auditory and crossmodal audiovisual setups from the multidisciplinary perspectives of psychology and cognitive neuroscience, and evaluates different ways to simulate analogous mechanisms in computational models and robotics. We discuss the gaps between these fields in this interdisciplinary review and provide insights about how to use psychological findings and theories in artificial intelligence from different perspectives.

show abstract

Section: Computational Models For the Human Cocktail Party Problem Somentioning

confidence: 99%

What Can Computational Models Learn From Human Selective Attention? A Review From an Audiovisual Unimodal and Crossmodal Perspective

Weber

Yang

et al. 2020

Front. Integr. Neurosci.

Self Cite

View full text Add to dashboard Cite

show abstract

“…At the same time, MAR models are widely used in forecasting. Within the same scope in [57], a neural network can be used to calculate the audio signal's angle. A forward-bound neural network is then used to deal with the noise.…”

Section: A Classification Of Articles Based On Domain Problemsmentioning

confidence: 99%

“…Furthermore, background noise can cause telephone channel distortions; suitable system performance in the presence of background noise requires high-quality microphone manufacturing [53]. In addition, to apply the approaches that use beamforming for speech segregation, the number of microphones has to be larger than the number of sound sources [57].…”

Section: B What Are the Major Challenges In Asr?(rq2)mentioning

confidence: 99%

“…Therefore, any locative property changes of the sound during the observation of received signals were caused by the change in the acoustic channel, which resulted in decreasing the performance and affecting the outcomes. In [57], the authors proposed an embedded cognition method to improve ASR for robots, using microphone arrays to locate the speech sources. They then separated the speech signals from background noise.…”

Section: What Are the Current Research Gaps In Asr?(rq3)mentioning

confidence: 99%

See 1 more Smart Citation

Automatic Speech Recognition: Systematic Literature Review

et al. 2021

View full text Add to dashboard Cite

A huge amount of research has been done in the field of speech signal processing in recent years. In particular, there has been increasing interest in the automatic speech recognition (ASR) technology field. ASR began with simple systems that responded to a limited number of sounds and has evolved into sophisticated systems that respond fluently to natural language. This systematic review of automatic speech recognition is provided to help other researchers with the most significant topics published in the last six years. This research will also help in identifying recent major ASR challenges in real-world environments. In addition, it discusses current research gaps in ASR. This review covers articles available in five research databases that were completed according to the preferred reporting items for systematic reviews and metaanalyses (PRISMA) protocol. The search strategy yielded 45 articles related to the study's scope for the period 2015-2020. The results presented in this review shed light on research trends in the area of ASR and also suggest new research directions.

show abstract

“…The aims of acoustic models for the Cocktail Party problem are: identifying multiple speakers and disentangling each speech stream from noisy background. Numerous classical acoustic models are data-driven and based on algorithms of signal processing (Dá vila-Chacó n et al, 2018). Those models are robust and with good accuracy but lack the prior knowledge, biological plausibility and rely on the large datasets.…”

Section: Computational Modelsmentioning

confidence: 99%

What can computational models learn from human selective attention? A review from an audiovisual crossmodal perspective

Fu¹,

Weber²,

Yang³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Selective attention plays an essential role in information acquisition and utilization from the environment. In the past 50 years, research on selective attention has been a central topic in cognitive science. Compared with unimodal studies, crossmodal studies are more complex but necessary to solve real-world challenges in both human experiments and computational modeling. Although an increasing number of findings on crossmodal selective attention have shed light on humans' behavioral patterns and neural underpinnings, a much better understanding is still necessary to yield the same benefit for computational intelligent agents. This article reviews studies of selective attention in unimodal visual and auditory and crossmodal audiovisual setups from the multidisciplinary perspectives of psychology and cognitive neuroscience, and evaluates different ways to simulate analogous mechanisms in computational models and robotics. We discuss the gaps between these fields in this interdisciplinary review and provide insights about how to use psychological findings and theories in artificial intelligence from different perspectives.

show abstract

Enhanced Robot Speech Recognition Using Biomimetic Binaural Sound Source Localization

Cited by 37 publications

References 47 publications

What Can Computational Models Learn From Human Selective Attention? A Review From an Audiovisual Unimodal and Crossmodal Perspective

What Can Computational Models Learn From Human Selective Attention? A Review From an Audiovisual Unimodal and Crossmodal Perspective

Automatic Speech Recognition: Systematic Literature Review

What can computational models learn from human selective attention? A review from an audiovisual crossmodal perspective

Contact Info

Product

Resources

About