2016
DOI: 10.1109/lsp.2016.2592683
|View full text |Cite
|
Sign up to set email alerts
|

Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…Handling multiple users : The presence of more than one user speaking at the same time, creates the effect of crosstalk, making it difficult for the system to correctly transcribe the speech. To minimize this problem, end‐to‐end systems are trained to perform source separation and speech recognition (Seki et al 2018), intelligent speakers use microphone arrays that can detect changes in the speech signal produced by simultaneous speakers allowing the detection of the direction to the closest speaker and then enhancing the microphone signal for that speaker (Khoubrouy and Hansen 2016). Diarization or speaker recognition algorithms (Shafey, Soltau, and Shafran 2019) could also be used to improve the accuracy and performance of the system.…”
Section: General Considerationsmentioning
confidence: 99%
“…Handling multiple users : The presence of more than one user speaking at the same time, creates the effect of crosstalk, making it difficult for the system to correctly transcribe the speech. To minimize this problem, end‐to‐end systems are trained to perform source separation and speech recognition (Seki et al 2018), intelligent speakers use microphone arrays that can detect changes in the speech signal produced by simultaneous speakers allowing the detection of the direction to the closest speaker and then enhancing the microphone signal for that speaker (Khoubrouy and Hansen 2016). Diarization or speaker recognition algorithms (Shafey, Soltau, and Shafran 2019) could also be used to improve the accuracy and performance of the system.…”
Section: General Considerationsmentioning
confidence: 99%
“…where <> denotes expectation. Note that σ 2 is independent of s and known, and hence we can drop it in (4). As a result,…”
Section: Wideband Dcbfmentioning
confidence: 99%
“…Although the automatic speech recognition (ASR) products have been widely implemented in practical applications, most of ASR systems are only suitable for short-range speech source within 5 m. The distant speech perception has not been well studied yet, and is a challenging task due to the severe signal attenuation, interference and background noise [1]- [4]. In indoor environments, reverberation is the main interference [5] while the wind noise is the main interference in outdoor environments [6].…”
Section: Introductionmentioning
confidence: 99%
“…During wavelet transform computation, complex phase are generated with nonlinearities in signal which are removed here. These coefficients are arranges as follows (8) denotes critical sampling rate, using this layer coefficients, localization information can be achieved in time and frequency domain by adjusting the frequency resolution of wavelets. Robustness of the system is increased by down sampling the signal with filter bank and taking the modulus of oscillatory components.…”
Section: B) Joint Time-frequency Pyramid Scatteringmentioning
confidence: 99%