Chris Bartels scite author profile

Congestion is a fundamental problem in VLSI design flows. Typically, it is handled by feeding back density information to the placers and routers. Fast and accurate congestion estimation is key in order to obtain a design flow with less iterations and higher predictability.Fast congestion prediction is based on an accurate approximation of the actual routing engine. In this paper we show experimentally that the number of two-pin nets with more than two bends in the actual router is negligible. It is also established that the ratio between the number of L-shapes and Z-shapes is more or less a constant.A fast and accurate algorithm for congestion prediction is developed. The above observations are translated into probabilities, that are used to "smear" out a net over its possible realizations. Extensive experimental evidence is provided using industrial designs.

show abstract

Graphical model architectures for speech recognition

Bilmes

Bartels

2005

IEEE Signal Process. Mag.

102

View full text Add to dashboard Cite

Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop

Livescu¹,

Çetin

Hasegawa‐Johnson

et al. 2007

View full text Add to dashboard Cite

We report on investigations, conducted at the 2006 Johns Hopkins Workshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we use the outputs of AF classifiers both directly, in an extension of hybrid HMM/neural network models, and as part of the observation vector, an extension of the "tandem" approach. In the area of pronunciation modeling, we investigate a model having multiple streams of AF states with soft synchrony constraints, for both audio-only and audio-visual recognition. The models are implemented as dynamic Bayesian networks, and tested on tasks from the Small-Vocabulary Switchboard (SVitchboard) corpus and the CUAVE audio-visual digits corpus. Finally, we analyze AF classification and forced alignment using a newly collected set of feature-level manual transcriptions.

show abstract

Voices Obscured in Complex Environmental Settings (VOiCES) Corpus

Richey¹,

Barrios²,

Armstrong³

et al. 2018

View full text Add to dashboard Cite

This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for model training. Despite these efforts, model performance degrades when tested against uncurated speech in natural conditions. For this corpus, audio was recorded in furnished rooms with background noise played in conjunction with foreground speech selected from the Lib-riSpeech corpus. Multiple sessions were recorded in each room to accommodate for all foreground speech-background noise combinations. Audio was recorded using twelve microphones placed throughout the room, resulting in 120 hours of audio per microphone. This work is a multi-organizational effort led by SRI International and Lab41 with the intent to push forward state-of-the-art distant microphone approaches in signal processing and speech recognition.

show abstract

Submodular subset selection for large-scale speech training data

Wei

Liu

Kirchhoff

et al. 2014

View full text Add to dashboard Cite

We address the problem of subselecting a large set of acoustic data to train automatic speech recognition (ASR) systems. To this end, we apply a novel data selection technique based on constrained submodular function maximization. Though NP-hard, the combinatorial optimization problem can be approximately solved by a simple and scalable greedy algorithm with constant-factor guarantees. We evaluate our approach by subselecting data from 1300 hours of conversational English telephone data to train two types large-vocabulary speech recognizers, one with Gaussian mixture model (GMM) based acoustic models, and another based on deep neural networks (DNNs). We show that training data can be reduced significantly, and that our technique outperforms both random selection and a previously proposed selection method utilizing comparable resources. Notably, using the submodular selection method, the DNN system using only about 5% of the training data is able to achieve performance on par with the GMM system using 100% of the training data -with the baseline subset selection methods, however, the DNN system is unable to accomplish this correspondence.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chris Bartels

Probabilistic congestion prediction

Graphical model architectures for speech recognition

Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop

Voices Obscured in Complex Environmental Settings (VOiCES) Corpus

Submodular subset selection for large-scale speech training data

Contact Info

Product

Resources

About