Dereverberation of autoregressive envelopes for far-field speech recognition

Purushothaman, Anurenjan; Sreeram, Anirudh; Kumar, Rohit; Ganapathy, Sriram

doi:10.1016/j.csl.2021.101277

Cited by 7 publications

(4 citation statements)

References 42 publications

(59 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, end-to-end models with attention based modeling have also been explored on the REVERB challenge dataset [21,22]. Previously, we had proposed a convolutional neural network model to perform dereverberation of speech [10,11]. In the current work, we extend this prior work for E2E transformer based ASR system.…”

Section: Literature Reviewmentioning

confidence: 94%

See 1 more Smart Citation

End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes

Kumar

Purushothaman

Sreeram

et al. 2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

The end-to-end (E2E) automatic speech recognition (ASR) systems are often required to operate in reverberant conditions, where the long-term sub-band envelopes of the speech are temporally smeared. In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes. The temporal envelopes are modeled using the framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelope gain based enhancement of temporal envelopes. The model architecture consists of a combination of convolutional and long short term memory (LSTM) neural network layers. Further, the envelope dereverberation, feature extraction and acoustic modeling using transformer based E2E ASR can all be jointly optimized for the speech recognition task. We perform E2E speech recognition experiments on the REVERB challenge dataset as well as on the VOiCES dataset. In these experiments, the proposed joint modeling approach yields significant improvements compared to the baseline E2E ASR system (average relative improvements of 21% on the REVERB challenge dataset and about 10% on the VOiCES dataset).

show abstract

Section: Literature Reviewmentioning

confidence: 94%

“…Our previous work [10,11] explored the use of dereverberation of sub-band envelopes for hybrid speech recognition systems. The sub-band envelopes are extracted using the autoregressive modeling framework of frequency domain linear prediction [12,13].…”

Section: Introductionmentioning

confidence: 99%

End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes

Kumar

Purushothaman

Sreeram

et al. 2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…One is to improve the pickup equipment or pickup method, such as using a microphone array instead of a single microphone, that is, multichannel speech signal acquisition. The second is to use some signal processing methods to improve the quality of far-field speech. , In this paper, from the perspective of front-end pickup, a flexible graphene sensor is proposed to detect vocal cord vibration signals to improve speech recognition performance in the far-field environment. Like other physiological signals, such as breathing signals and pulse signals, vocal fold vibration signals are relatively weak signals.…”

Section: Introductionmentioning

confidence: 99%

“…The second is to use some signal processing methods to improve the quality of farfield speech. 13,14 In this paper, from the perspective of frontend pickup, a flexible graphene sensor is proposed to detect short vowels in human pronunciation using graphene-like MXene to detect vocal vibration signals. However, the preparation process of this material is complex and the cost is high, and the stability of the MXene material at room temperature is poor.…”

Section: Introductionmentioning

confidence: 99%

Research on Throat Speech Signal Detection Based on a Flexible Graphene Piezoresistive Sensor

Tong

Zhang

Chen

et al. 2022

ACS Appl. Electron. Mater.

View full text Add to dashboard Cite

Aiming at the problem that traditional speech acquisition and recognition are susceptible to environmental noise, this paper proposes a flexible graphene sensor to detect vocal vibration signals. First, the speech detection sensor with a cylindrical microsurface structure substrate is prepared by chemical vapor deposition (CVD) and imprint technology, which greatly improves the conformal coating cover ability and sensitivity of the sensor. In the range of 200–2500 Hz, the average voltage gain of the sensor is ∼48 dB, and this frequency range basically covers the human speech frequency. On this basis, we conducted a bilingual detection (Chinese and English). All data obtained shows that the graphite speech sensor has sufficient sensitivity to extract the characteristics of acoustic waves. At the same time, the proposed cylindrical microsurface structure reduces the probability of random fracture of the graphene layer. In addition, the speech signals collected by a microphone and the flexible graphene speech detection sensor are used to train a neural network. The recognition accuracy of the data set mixed with vocal cord speech signals is 75.9%. The comparison verifies that the signals detected by the sensor have sufficient characteristic information to complete speech recognition tasks.

show abstract

A proposed method to improve the WER of an ASR system in the noisy reverberant room

Sadeghi,

Sheikhzadeh,

Emadi

2024

Journal of the Franklin Institute

View full text Add to dashboard Cite

Dereverberation of autoregressive envelopes for far-field speech recognition

Cited by 7 publications

References 42 publications

End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes

End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes

Research on Throat Speech Signal Detection Based on a Flexible Graphene Piezoresistive Sensor

A proposed method to improve the WER of an ASR system in the noisy reverberant room

Contact Info

Product

Resources

About