Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition

Khoubrouy, Soudeh A.; Hansen, John H. L.

doi:10.1109/lsp.2016.2592683

Cited by 8 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Handling multiple users : The presence of more than one user speaking at the same time, creates the effect of crosstalk, making it difficult for the system to correctly transcribe the speech. To minimize this problem, end‐to‐end systems are trained to perform source separation and speech recognition (Seki et al 2018), intelligent speakers use microphone arrays that can detect changes in the speech signal produced by simultaneous speakers allowing the detection of the direction to the closest speaker and then enhancing the microphone signal for that speaker (Khoubrouy and Hansen 2016). Diarization or speaker recognition algorithms (Shafey, Soltau, and Shafran 2019) could also be used to improve the accuracy and performance of the system.…”

Section: General Considerationsmentioning

confidence: 99%

Considerations on creating conversational agents for multiple environments and users

Cebrián¹,

Martínez-Jiménez²,

Rodrı́guez³

et al. 2021

AI Magazine

View full text Add to dashboard Cite

Advances in artificial intelligence algorithms and expansion of straightforward cloud-based platforms have enabled the adoption of conversational assistants by both, medium and large companies, to facilitate interaction between clients and employees. The interactions are possible through the use of ubiquitous devices (e.g., Amazon Echo, Apple HomePod, Google Nest), virtual assistants (e.g., Apple Siri, Google Assistant, Samsung Bixby, or Microsoft Cortana), chat windows on the corporate website, or social network applications (e.g. Facebook Messenger, Telegram, Slack, WeChat).Creating a useful, personalized conversational agent that is also robust and popular is nonetheless challenging work. It requires picking the right algorithm, framework, and/or communication channel, but perhaps more importantly, consideration of the specific task, user needs, environment, available training data, budget, and a thoughtful design.In this paper, we will consider the elements necessary to create a conversational agent for different types of users, environments, and tasks. The elements will account for the limited amount of data available for specific tasks within a company and for non-English languages. We are confident that we can provide a useful resource for the new practitioner developing an agent. We can point out novice problems/traps to avoid, create consciousness that the development of the technology is achievable despite comprehensive and significant challenges, and raise awareness about different ethical issues that may be associated with this technology. We have compiled our experience with deploying conversational systems for daily use in multicultural, multilingual, and intergenerational settings. Additionally, we will give insight on how to scale the proposed solutions.

show abstract

Section: General Considerationsmentioning

confidence: 99%

Considerations on creating conversational agents for multiple environments and users

Cebrián¹,

Martínez-Jiménez²,

Rodrı́guez³

et al. 2021

AI Magazine

View full text Add to dashboard Cite

show abstract

“…where <> denotes expectation. Note that σ 2 is independent of s and known, and hence we can drop it in (4). As a result,…”

Section: Wideband Dcbfmentioning

confidence: 99%

“…Although the automatic speech recognition (ASR) products have been widely implemented in practical applications, most of ASR systems are only suitable for short-range speech source within 5 m. The distant speech perception has not been well studied yet, and is a challenging task due to the severe signal attenuation, interference and background noise [1]- [4]. In indoor environments, reverberation is the main interference [5] while the wind noise is the main interference in outdoor environments [6].…”

Section: Introductionmentioning

confidence: 99%

Deconvolved Conventional Beamforming and Adaptive Cubature Kalman Filter Based Distant Speech Perception System

et al. 2020

View full text Add to dashboard Cite

A spatial-temporal processing framework integrated of speech enhancement and speech tracking is proposed in this paper for distant speech perception system. First, weak speech signals are enhanced by the deconvolved conventional beamforming (DCBF) with a microphone array. By virtue of the narrow beamwidth and low sidelobes of the DCBF, the competing sources can be effectively suppressed without introducing extra speech distortion. Second, with the accurate bearing provided by the DCBF, the Cubature Kalman filter can be utilized to track the speech source of interest. By introducing a scaling factor in the current statistical motion model, a new tracking algorithm is proposed which is suitable for both maneuvering and nonmaneuvering speech sources. The introduced scaling factor can be adaptively adjusted to improve the tracking performance of the proposed algorithm for different motion models. Numerical results show that the proposed algorithm can provide better tracking performance than the conventional one. In particular, the tracking root mean square error can be reduced by half for some cases. INDEX TERMS Cubature Kalman filter, deconvolved conventional beamforming, improved current statistical motion model, maneuvering speech source, speech perception system.

show abstract

“…During wavelet transform computation, complex phase are generated with nonlinearities in signal which are removed here. These coefficients are arranges as follows (8) denotes critical sampling rate, using this layer coefficients, localization information can be achieved in time and frequency domain by adjusting the frequency resolution of wavelets. Robustness of the system is increased by down sampling the signal with filter bank and taking the modulus of oscillatory components.…”

Section: B) Joint Time-frequency Pyramid Scatteringmentioning

confidence: 99%

Improving the Performance of Automatic Speech Recognition Using Blind Source Separation

S¹,

Avinash²,

Nataraja³

2019

IJEAT

View full text Add to dashboard Cite

In real world applications, Speech recognition system have grown due its significance in various online and offline applications such as security, robotic application, speech translator etc. These systems are widely used now-a-days where acquisition of signal is performed using various instruments which causes noise, source mixing and other impurities which affects the performance of speech recognition system. In this work, issue of source mixing in original speech signal is addressed which causes performance degradation. In order to overcome this we propose a new approach which utilizes non-negative matrix factorization modelling. This method utilizes scattering transform by applying wavelet filter bank and pyramid scattering to estimate the source and minimization of unwanted signals. After estimation the signals or sources, source separation algorithm is implemented using optimization process based on the training and testing method. Proposed approach is compared with other existing method by computing performance measurement matrices which shows the better performance

show abstract

Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition

Cited by 8 publications

References 11 publications

Considerations on creating conversational agents for multiple environments and users

Considerations on creating conversational agents for multiple environments and users

Deconvolved Conventional Beamforming and Adaptive Cubature Kalman Filter Based Distant Speech Perception System

Improving the Performance of Automatic Speech Recognition Using Blind Source Separation

Contact Info

Product

Resources

About