Zhang Yu scite author profile

Sequence-to-sequence models have shown success in end-to-end speech recognition. However these models have only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more expressive power and better generalization for end-to-end ASR models. We apply network-in-network principles, batch normalization, residual connections and convolutional LSTMs to build very deep recurrent and convolutional structures. Our models exploit the spectral structure in the feature space and add computational depth without overfitting issues. We experiment with the WSJ ASR task and achieve 10.5% word error rate without any dictionary or language using a 15 layer deep network.

show abstract

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

Zen

Dang²,

Clark

et al. 2019

360

181

View full text Add to dashboard Cite

This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text materials of the LibriSpeech corpus, which has been used for training and evaluating automatic speech recognition systems. The new corpus inherits desired properties of the LibriSpeech corpus while addressing a number of issues which make LibriSpeech less than ideal for text-to-speech work. The released corpus consists of 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and the corresponding texts. Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers. The corpus is freely available for download from http://www.openslr.org/60/.

show abstract

Optimizing spatial patterns with sparse filter bands for motor-imagery based brain–computer interface

Zhou

Jin

et al. 2015

Journal of Neuroscience Methods

234

148

View full text Add to dashboard Cite

Temporally Constrained Sparse Group Spatial Patterns for Motor Imagery BCI

Nam

Zhou

et al. 2019

IEEE Trans. Cybern.

261

146

View full text Add to dashboard Cite

Common spatial pattern (CSP)-based spatial filtering has been most popularly applied to electroencephalogram (EEG) feature extraction for motor imagery (MI) classification in brain-computer interface (BCI) application. The effectiveness of CSP is highly affected by the frequency band and time window of EEG segments. Although numerous algorithms have been designed to optimize the spectral bands of CSP, most of them selected the time window in a heuristic way. This is likely to result in a suboptimal feature extraction since the time period when the brain responses to the mental tasks occurs may not be accurately detected. In this paper, we propose a novel algorithm, namely temporally constrained sparse group spatial pattern (TSGSP), for the simultaneous optimization of filter bands and time window within CSP to further boost classification accuracy of MI EEG. Specifically, spectrum-specific signals are first derived by bandpass filtering from raw EEG data at a set of overlapping filter bands. Each of the spectrum-specific signals is further segmented into multiple subseries using sliding window approach. We then devise a joint sparse optimization of filter bands and time windows with temporal smoothness constraint to extract robust CSP features under a multitask learning framework. A linear support vector machine classifier is trained on the optimized EEG features to accurately identify the MI tasks. An experimental study is implemented on three public EEG datasets (BCI Competition III dataset IIIa, BCI Competition IV datasets IIa, and BCI Competition IV dataset IIb) to validate the effectiveness of TSGSP in comparison to several other competing methods. Superior classification performance (averaged accuracies are 88.5%, 83.3%, and 84.3% for the three datasets, respectively) based on the experimental results confirms that the proposed algorithm is a promising candidate for performance improvement of MI-based BCIs.

show abstract

Mapping essential urban land use categories in China (EULUC-China): preliminary results for 2018

et al. 2020

View full text Add to dashboard Cite

Multiway Canonical Correlation Analysis for Frequency Components Recognition in SSVEP-Based BCIs

et al. 2011

View full text Add to dashboard Cite

An electroencephalographic signature predicts antidepressant response in major depression

et al. 2020

View full text Add to dashboard Cite

Antidepressants are widely prescribed, but their efficacy relative to placebo is modest, in part because the clinical diagnosis of major depression encompasses biologically heterogeneous conditions. Here, we sought to identify a neurobiological signature of response to antidepressant treatment as compared to placebo. We designed a latent-space machine learning algorithm tailored for resting-state electroencephalography (rsEEG) and applied it to data from the largest imaging-coupled, placebo-controlled antidepressant study (n=309). Symptom improvement was robustly predicted in a manner both specific for the antidepressant sertraline (versus placebo) and generalizable across different study sites and EEG equipment. This sertraline-predictive EEG signature generalized to two depression samples, wherein it reflected general antidepressant medication responsivity, and related differentially to repetitive transcranial magnetic stimulation (rTMS) treatment outcome. Furthermore, we found that the sertraline rsEEG signature indexed prefrontal neural responsivity, as measured by concurrent TMS/EEG. Our findings advance the neurobiological understanding of antidepressant treatment through an EEG-tailored computational model and provide a clinical avenue for personalized treatment of depression.

show abstract

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

Qin²,

Park³

et al. 2020

Preprint

117

View full text Add to dashboard Cite

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pretrained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%. * Equal contribution.Preprint. Under review.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhang Yu

Very deep convolutional networks for end-to-end speech recognition

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

Optimizing spatial patterns with sparse filter bands for motor-imagery based brain–computer interface

Temporally Constrained Sparse Group Spatial Patterns for Motor Imagery BCI

Mapping essential urban land use categories in China (EULUC-China): preliminary results for 2018

Multiway Canonical Correlation Analysis for Frequency Components Recognition in SSVEP-Based BCIs

An electroencephalographic signature predicts antidepressant response in major depression

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

Contact Info

Product

Resources

About