Fil Alleva scite author profile

We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a twostage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1% word error rate on the 2000 Switchboard evaluation set.

show abstract

An overview of the SPHINX-II speech recognition system

Huang

et al. 1993

View full text Add to dashboard Cite

In the past year at Carnegie Mellon steady progress has been made in the area of acoustic and language modeling. The result has been a dramatic reduction in speech recognition errors in the SPHINX-II system. In this paper, we review SPHINX-I/and summarize our recent efforts on improved speech recognition. Recently SPHINX-I/ achieved the lowest error rate in the November 1992 DARPA evaluations. For 5000-word, speaker-independent, continuous, speech recognition, the error rate was reduced to 5%.

show abstract

Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition

et al. 2018

View full text Add to dashboard Cite

The SPHINX-II speech recognition system: an overview

Huang

Alleva

Hon

et al. 1993

Computer Speech & Language

251

View full text Add to dashboard Cite

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

Yoshioka¹,

Erdoğan²,

Chen³

et al. 2018

View full text Add to dashboard Cite

The goal of this work is to develop a meeting transcription system that can recognize speech even when utterances of different speakers are overlapped. While speech overlaps have been regarded as a major obstacle in accurately transcribing meetings, a traditional beamformer with a single output has been exclusively used because previously proposed speech separation techniques have critical constraints for application to real meetings. This paper proposes a new signal processing module, called an unmixing transducer, and describes its implementation using a windowed BLSTM. The unmixing transducer has a fixed number, say J, of output channels, where J may be different from the number of meeting attendees, and transforms an input multi-channel acoustic signal into J time-synchronous audio streams. Each utterance in the meeting is separated and emitted from one of the output channels. Then, each output signal can be simply fed to a speech recognition back-end for segmentation and transcription. Our meeting transcription system using the unmixing transducer outperforms a system based on a stateof-the-art neural mask-based beamformer by 10.8%. Significant improvements are observed in overlapped segments. To the best of our knowledge, this is the first report that applies overlapped speech recognition to unconstrained real meeting audio.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fil Alleva

The Microsoft 2017 Conversational Speech Recognition System

An overview of the SPHINX-II speech recognition system

Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition

The SPHINX-II speech recognition system: an overview

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

Contact Info

Product

Resources

About