Francis Keith scite author profile

Francis Keith

5Publications

24Citation Statements Received

61Citation Statements Given

How they've been cited

How they cite others

Affiliations

Raytheon Technologies (United States)

Publications

Order By: Most citations

Sage: The New BBN Speech Processing Platform

Hsiao¹,

Meermeier²,

Ng³

et al. 2016

View full text Add to dashboard Cite

To capitalize on the rapid development of Speech-to-Text (STT) technologies and the proliferation of open source machine learning toolkits, BBN has developed Sage, a new speech processing platform that integrates technologies from multiple sources, each of which has particular strengths. In this paper, we describe the design of Sage, which allows the easy interchange of STT components from different sources. We also describe our approach for fast prototyping with new machine learning toolkits, and a framework for sharing STT components across different applications. Finally, we report Sage's state-of-the-art performance on different STT tasks.

show abstract

Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer

Jeff¹,

Keith²,

Ng³

et al. 2017

View full text Add to dashboard Cite

This paper reports our recent progress on using multilingual data for improving speech-to-text (STT) systems that can be easily delivered. We continued the work BBN conducted on the use of multilingual data for improving Babel evaluation systems, but focused on training time-delay neural network (TDNN) based chain models. As done for the Babel evaluations, we used multilingual data in two ways: first, to train multilingual deep neural networks (DNN) for extracting bottleneck (BN) features, and second, for initializing training on target languages. Our results show that TDNN chain models trained on multilingual DNN bottleneck features yield significant gains over their counterparts trained on MFCC plus i-vector features. By initializing from models trained on multilingual data, TDNN chain models can achieve great improvements over random initializations of the network weights on target languages. Two other important findings are: 1) initialization with multilingual TDNN chain models produces larger gains on target languages that have less training data; 2) inclusion of target languages in multilingual training for either BN feature extraction or initialization have limited impact on performance measured on the target languages. Our results also reveal that for TDNN chain models, the combination of multilingual BN features and multilingual initialization achieves the best performance on all target languages.

show abstract

Optimizing Multilingual Knowledge Transfer for Time-Delay Neural Networks with Low-Rank Factorization

Keith

Hartmann

Siu

et al. 2018

View full text Add to dashboard Cite

Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features

Hartmann¹,

Hsiao²,

Ng³

et al. 2017

View full text Add to dashboard Cite

On small datasets, discriminatively trained bottleneck features from deep networks commonly outperform more traditional spectral or cepstral features. While these features are typically trained with small, fully-connected networks, recent studies have used more sophisticated networks with great success. We use the recent deep CNN (VGG) network for bottleneck feature extraction-previously used only for low-resource tasksand apply it to the Switchboard English conversational telephone speech task. Unlike features derived from traditional MLP networks, the VGG features outperform cepstral features even when used with BLSTM acoustic models trained on large amounts of data. We achieve the best BBN single system performance when combining the VGG features with a BLSTM acoustic model. When decoding with an n-gram language model, which are used for deployable systems, we have a realistic production system with a WER of 7.4%. This result is competitive with the current state-of-the-art in the literature. While our focus is on realistic single system performance, we further reduce the WER to 6.1% through system combination and using expensive neural network language model rescoring.

show abstract

Combining Unsupervised and Text Augmented Semi-Supervised Learning For Low Resourced Autoregressive Speech Recognition

Keith

Hartmann

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Francis Keith

Sage: The New BBN Speech Processing Platform

Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer

Optimizing Multilingual Knowledge Transfer for Time-Delay Neural Networks with Low-Rank Factorization

Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features

Combining Unsupervised and Text Augmented Semi-Supervised Learning For Low Resourced Autoregressive Speech Recognition

Contact Info

Product

Resources

About