Using Automatic Speech Recognition and Speech Synthesis to Improve the Intelligibility of Cochlear Implant users in Reverberant Listening Environments

Chu, Kevin M.; Collins, Leslie M.; Mainsah, Boyla O.

doi:10.1109/icassp40776.2020.9054450

Cited by 5 publications

(3 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Automatic speech recognition (ASR) has a long history of research (Bahl et al, 1983;Hinton et al, 2012;Chu et al, 2020). By audio signal processing and modeling, speech contents can be transcribed into texts for various applications (Yu and Deng, 2016;Yang et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

A novel silent speech recognition approach based on parallel inception convolutional neural network and Mel frequency spectral coefficient

Zhang

Xie

et al. 2022

Front. Neurorobot.

View full text Add to dashboard Cite

Silent speech recognition breaks the limitations of automatic speech recognition when acoustic signals cannot be produced or captured clearly, but still has a long way to go before being ready for any real-life applications. To address this issue, we propose a novel silent speech recognition framework based on surface electromyography (sEMG) signals. In our approach, a new deep learning architecture Parallel Inception Convolutional Neural Network (PICNN) is proposed and implemented in our silent speech recognition system, with six inception modules processing six channels of sEMG data, separately and simultaneously. Meanwhile, Mel Frequency Spectral Coefficients (MFSCs) are employed to extract speech-related sEMG features for the first time. We further design and generate a 100-class dataset containing daily life assistance demands for the elderly and disabled individuals. The experimental results obtained from 28 subjects confirm that our silent speech recognition method outperforms state-of-the-art machine learning algorithms and deep learning architectures, achieving the best recognition accuracy of 90.76%. With sEMG data collected from four new subjects, efficient steps of subject-based transfer learning are conducted to further improve the cross-subject recognition ability of the proposed model. Promising results prove that our sEMG-based silent speech recognition system could have high recognition accuracy and steady performance in practical applications.

show abstract

Section: Introductionmentioning

confidence: 99%

A novel silent speech recognition approach based on parallel inception convolutional neural network and Mel frequency spectral coefficient

Zhang

Xie

et al. 2022

Front. Neurorobot.

View full text Add to dashboard Cite

show abstract

“…Automatic speech recognition (ASR) systems have experienced substantial improvements in recognizing speech in reverberant environments [7]. We [8] and others [9] implemented a strategy in CIs that leverages ASR to translate reverberant speech into an estimated text sequence and uses speech synthesis to generate anechoic speech from the predicted text. This ASR speech synthesis strategy substantially improved reverberant speech intelligibility in CI users [9] and in normal hearing listeners using vocoded speech [8].…”

Section: Introductionmentioning

confidence: 99%

“…We [8] and others [9] implemented a strategy in CIs that leverages ASR to translate reverberant speech into an estimated text sequence and uses speech synthesis to generate anechoic speech from the predicted text. This ASR speech synthesis strategy substantially improved reverberant speech intelligibility in CI users [9] and in normal hearing listeners using vocoded speech [8]. However, the ASR speech synthesis strategy is not real-time feasible in a CI processor because it imposes a processing delay that exceeds the maximum audio-visual delay that CI users can tolerate, which is about 260ms [10].…”

Section: Introductionmentioning

confidence: 99%

A Causal Deep Learning Framework for Classifying Phonemes in Cochlear Implants

Chu

Collins

Mainsah

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Speech intelligibility in cochlear implant (CI) users degrades considerably in listening environments with reverberation and noise. Previous research in automatic speech recognition (ASR) has shown that phoneme-based speech enhancement algorithms improve ASR system performance in reverberant environments as compared to a global model. However, phoneme-specific speech processing has not yet been implemented in CIs. In this paper, we propose a causal deep learning framework for classifying phonemes using features extracted at the time-frequency resolution of a CI processor. We trained and tested long short-term memory networks to classify phonemes and manner of articulation in anechoic and reverberant conditions. The results showed that CI-inspired features provide slightly higher levels of performance than traditional ASR features. To the best of our knowledge, this study is the first to provide a classification framework with the potential to categorize phonetic units in real-time in a CI.

show abstract