Kehuang Li scite author profile

We propose a deep neural network (DNN) approach to speech bandwidth expansion (BWE) by estimating the spectral mapping function from narrowband (4 kHz in bandwidth) to wideband (8 kHz in bandwidth). Log-spectrum power is used as the input and output features to perform the required nonlinear transformation, and DNNs are trained to realize this high-dimensional mapping function. When evaluating the proposed approach on a large-scale 10-hour test set, we found that the DNN-expanded speech signals give excellent objective quality measures in terms of segmental signal-to-noise ratio and log-spectral distortion when compared with conventional BWE based on Gaussian mixture models (GMMs). Subjective listening tests also give a 69% preference score for DNN-expanded speech over 31% for GMM when the phase information is assumed known. For tests in real operation when the phase information is imaged from the given narrowband signal the preference comparison goes up to 84% versus 16%. A correct phase recovery can further increase the BWE performance for the proposed DNN method.

show abstract

A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

Yang

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition

et al. 2017

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Deep learning with maximal figure-of-merit cost to advance multi-label speech attribute detection

Kukanov

Hautamäki

Siniscalchi

et al. 2016

View full text Add to dashboard Cite

DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech

Huang

et al. 2015

View full text Add to dashboard Cite

Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees

Siniscalchi

et al. 2016

View full text Add to dashboard Cite

We propose a novel decision tree based framework to detect phonetic mispronunciations produced by L2 learners caused by using inaccurate speech attributes, such as manner and place of articulation. Compared with conventional score-based CAPT (computer assisted pronunciation training) systems, our proposed framework has three advantages: (1) each mispronunciation in a tree can be interpreted and communicated to the L2 learners by traversing the corresponding path from a leaf node to the root node; (2) corrective feedback based on speech attribute features, which are directly used to describe how consonants and vowels are produced using related articulators, can be provided to the L2 learners; and (3) by building the phone-dependent decision tree, the relative importance of the speech attribute features of a target phone can be automatically learned and used to distinguish itself from other phones. This information can provide L2 learners speech attribute feedback that is ranked in order of importance. In addition to the abovementioned advantages, experimental results confirm that the proposed approach can detect most pronunciation errors and provide accurate diagnostic feedback.

show abstract

Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications

Zhou

Lee

2016

ACM Trans. Access. Comput.

View full text Add to dashboard Cite

We propose a new approach to modeling transition information between signs in continuous Sign Language Recognition (SLR) and address some scalability issues in designing SLR systems. In contrast to Automatic Speech Recognition (ASR) in which the transition between speech sounds is often brief and mainly addressed by the coarticulation effect, the sign transition in continuous SLR is far from being clear and usually not easily and exactly characterized. Leveraging upon hidden Markov modeling techniques from ASR, we proposed a modeling framework for continuous SLR having the following major advantages, namely: (i) the system is easy to scale up to large-vocabulary SLR; (ii) modeling of signs as well as the transitions between signs is robust even for noisy data collected in real-world SLR; and (iii) extensions to training, decoding, and adaptation are directly applicable even with new deep learning algorithms. A pair of low-cost digital gloves affordable for the deaf and hard of hearing community is used to collect a collection of training and testing data for real-world SLR interaction applications. Evaluated on 1,024 testing sentences from five signers, a word accuracy rate of 87.4% is achieved using a vocabulary of 510 words. The SLR speed is in real time, requiring an average of 0.69s per sentence. The encouraging results indicate that it is feasible to develop real-world SLR applications based on the proposed SLR framework.

show abstract

A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems

Yang

Lee

2016

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kehuang Li

A deep neural network approach to speech bandwidth expansion

A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition

Deep learning with maximal figure-of-merit cost to advance multi-label speech attribute detection

DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech

Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees

Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications

A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems

Contact Info

Product

Resources

About