Puming Zhan scite author profile

Puming Zhan

4Publications

122Citation Statements Received

67Citation Statements Given

How they've been cited

181

122

How they cite others

Affiliations

Guangdong University of Technology, Nuance Communications (Austria), Carnegie Mellon University

Publications

Order By: Most citations

Janus-III: speech-to-speech translation in multiple languages

Lavie

Waibel

Levin

et al.

View full text Add to dashboard Cite

This paper describes JANUS-III, our most recent v ersion of the JANUS speech-to-speech translation system. We present a n o verview of the system and focus on how system design facilitates speech translation between multiple languages, and allows for easy adaptation to new source and target languages. We also describe our methodology for evaluation of end-to-end system performance with a variety of source and target languages. For system development and evaluation, we h a ve experimented with both push-to-talk as well as cross-talk recording conditions. To date, our system has achieved performance levels of over 80 acceptable translations on transcribed input, and over 70 acceptable translations on speech input recognized with a 75-90 word accuracy. Our current major research is concentrated on enhancing the capabilities of the system to deal with input in broad and general domains.

show abstract

Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition

Zhan¹,

Waibel²

1997

View full text Add to dashboard Cite

Deep Learning Based Mandarin Accent Identification for Accent Robust ASR

Weninger

Sun

Park³

et al. 2019

View full text Add to dashboard Cite

In this paper, we present an in-depth study on the classification of regional accents in Mandarin speech. Experiments are carried out on Mandarin speech data systematically collected from 15 different geographical regions in China for broad coverage. We explore bidirectional Long Short-Term Memory (bLSTM) networks and i-vectors to model longer-term acoustic context. Starting from the classification of the collected data into the 15 regional accents, we derive a three-class grouping via non-metric dimensional scaling (NMDS), for which 68.4% average recall can be obtained. Furthermore, we evaluate a state-of-the-art ASR system on the accented data and demonstrate that the character error rate (CER) strongly varies among these accent groups, even if i-vector speaker adaptation is used. Finally, we show that model selection based on the prediction of our bLSTM accent classifier can yield up to 7.6 % CER reduction for accented speech.

show abstract

Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Weninger

Andrés-Ferrer

Li³

et al. 2019

View full text Add to dashboard Cite

Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the-art performances while having clear advantages in terms of simplicity. However, comparisons are mostly done on speaker independent (SI) ASR systems, though speaker adapted conventional systems are commonly used in practice for improving robustness to speaker and environment variations. In this paper, we apply speaker adaptation to seq2seq models with the goal of matching the performance of conventional ASR adaptation. Specifically, we investigate Kullback-Leibler divergence (KLD) as well as Linear Hidden Network (LHN) based adaptation for seq2seq ASR, using different amounts (up to 20 hours) of adaptation data per speaker. Our SI models are trained on large amounts of dictation data and achieve state-of-the-art results. We obtained 25% relative word error rate (WER) improvement with KLD adaptation of the seq2seq model vs. 18.7% gain from acoustic model adaptation in the conventional system. We also show that the WER of the seq2seq model decreases log-linearly with the amount of adaptation data. Finally, we analyze adaptation based on the minimum WER criterion and adapting the language model (LM) for score fusion with the speaker adapted seq2seq model, which result in further improvements of the seq2seq system performance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Puming Zhan

Janus-III: speech-to-speech translation in multiple languages

Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition

Deep Learning Based Mandarin Accent Identification for Accent Robust ASR

Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Contact Info

Product

Resources

About