Juan M. Huerta scite author profile

We describe procedures and experimental results using speech from diverse source languages to build an ASR system for a single target language. This work is intended to improve ASR in languages for which large amounts of training data are not available. We have developed both knowledge based and automatic methods to map phonetic units from the source languages to the target language. We employed HMM adaptation techniques and Discriminative Model Combination to combine acoustic models from the individual source languages for recognition of speech in the target language. Experiments are described in which Czech Broadcast News is transcribed using acoustic models trained from small amounts of Czech read speech augmented by English, Spanish, Russian, and Mandarin acoustic models.

show abstract

Designing crowdsourcing community for the enterprise

Stewart

Huerta

Sader

2009

View full text Add to dashboard Cite

Crowdsourcing participation inequality

Stewart

Lubensky

Huerta

2010

View full text Add to dashboard Cite

In large scale online multi-user communities, the phenomenon of 'participation inequality,' has been described as generally following a more or less 90-9-1 rule [9]. In this paper, we examine crowdsourcing participation levels inside the enterprise (within a company's firewall) and show that it is possible to achieve a more equitable distribution of 33-66-1. Accordingly, we propose a SCOUT ((S)uper Contributor, (C)ontributor, and (OUT)lier)) model for describing user participation based on quantifiable effort-level metrics. In support of this framework, we present an analysis that measures the quantity of contributions correlated with responses to motivation and incentives. In conclusion, SCOUT provides the task-based categories to characterize participation inequality that is evident in online communities, and crucially, also demonstrates the inequality curve (and associated characteristics) in the enterprise domain.

show abstract

Relative rank statistics for dialog analysis

Huerta

2008

View full text Add to dashboard Cite

We introduce the relative rank differential statistic which is a non-parametric approach to document and dialog analysis based on word frequency rank-statistics. We also present a simple method to establish semantic saliency in dialog, documents, and dialog segments using these word frequency rank statistics. Applications of our technique include the dynamic tracking of topic and semantic evolution in a dialog, topic detection, automatic generation of document tags, and new story or event detection in conversational speech and text. Our approach benefits from the robustness, simplicity and efficiency of non-parametric and rank based approaches and consistently outperformed term-frequency and TF-IDF cosine distance approaches in several experiments conducted.

show abstract

Distortion-class modeling for robust speech recognition under GSM RPE-LTP coding

Huerta

Stern

2001

Speech Communication

View full text Add to dashboard Cite

We present a method to reduce the degradation in recognition accuracy introduced by full-rate GSM RPE-LTP coding by combining sets of acoustic models trained under di erent distortion conditions. During recognition, the a posteriori probabilities of an utterance are calculated as a weighted sum of the posteriors corresponding to the individual models. The phonemes used by the systemÕs word pronunciations are grouped into classes according to amount of distortion they undergo in coding. The acoustic model used in the decoding process is a weighted combination of models derived from clean speech and models derived from speech that had been degraded by GSM coding (the source models), with the relative combination of the two sources depending on the extent to which each class of phonemes is degraded by the coding process. To determine the distortion class membership, and hence the weights, we measure the spectral distortion introduced to the quantized long-term residual by the RPE-LTP codec. We discuss how this distortion varies according to phonetic class. The method described reduces the degradation in recognition accuracy introduced by GSM coding of sentences in the TIMIT database by more than 70% relative to the baseline accuracy obtained in matched training and testing conditions with respect to a system using the source acoustic models, and up to 60% relative to the best baseline systems regardless of the number of Gaussians. Ó

show abstract

Where Do We Go from Here?

Pieraccini¹,

Huerta²

View full text Add to dashboard Cite

A stack decoder approach to approximate string matching

Huerta

2010

View full text Add to dashboard Cite

show abstract

Subsequence similarity language models

Huerta

2011

View full text Add to dashboard Cite

In this work we present the Subsequence Similarity Language Model (S2-LM) which is a new approach to language modeling based on string similarity. As a language model, S2-LM generates scores based on the closest matching string given a very large corpus. In this paper we describe the properties and advantages of our approach and describe efficient methods to carry out its computation. We describe an n-best rescoring experiment intended to show that S2-LM can be adjusted to behave as an n-gram SLM model.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Juan M. Huerta

Towards language independent acoustic modeling

Designing crowdsourcing community for the enterprise

Crowdsourcing participation inequality

Relative rank statistics for dialog analysis

Distortion-class modeling for robust speech recognition under GSM RPE-LTP coding

Where Do We Go from Here?

A stack decoder approach to approximate string matching

Subsequence similarity language models

Contact Info

Product

Resources

About