Juan M. Huerta scite author profile

et al.

We describe procedures and experimental results using speech from diverse source languages to build an ASR system for a single target language. This work is intended to improve ASR in languages for which large amounts of training data are not available. We have developed both knowledge based and automatic methods to map phonetic units from the source languages to the target language. We employed HMM adaptation techniques and Discriminative Model Combination to combine acoustic models from the individual source languages for recognition of speech in the target language. Experiments are described in which Czech Broadcast News is transcribed using acoustic models trained from small amounts of Czech read speech augmented by English, Spanish, Russian, and Mandarin acoustic models.

Designing crowdsourcing community for the enterprise

Stewart

Sader

2009

Crowdsourcing participation inequality

Stewart

Lubensky

2010

In large scale online multi-user communities, the phenomenon of 'participation inequality,' has been described as generally following a more or less 90-9-1 rule [9]. In this paper, we examine crowdsourcing participation levels inside the enterprise (within a company's firewall) and show that it is possible to achieve a more equitable distribution of 33-66-1. Accordingly, we propose a SCOUT ((S)uper Contributor, (C)ontributor, and (OUT)lier)) model for describing user participation based on quantifiable effort-level metrics. In support of this framework, we present an analysis that measures the quantity of contributions correlated with responses to motivation and incentives. In conclusion, SCOUT provides the task-based categories to characterize participation inequality that is evident in online communities, and crucially, also demonstrates the inequality curve (and associated characteristics) in the enterprise domain.

Relative rank statistics for dialog analysis

2008

We introduce the relative rank differential statistic which is a non-parametric approach to document and dialog analysis based on word frequency rank-statistics. We also present a simple method to establish semantic saliency in dialog, documents, and dialog segments using these word frequency rank statistics. Applications of our technique include the dynamic tracking of topic and semantic evolution in a dialog, topic detection, automatic generation of document tags, and new story or event detection in conversational speech and text. Our approach benefits from the robustness, simplicity and efficiency of non-parametric and rank based approaches and consistently outperformed term-frequency and TF-IDF cosine distance approaches in several experiments conducted.

Distortion-class modeling for robust speech recognition under GSM RPE-LTP coding

Stern

2001

Speech Communication

We present a method to reduce the degradation in recognition accuracy introduced by full-rate GSM RPE-LTP coding by combining sets of acoustic models trained under di erent distortion conditions. During recognition, the a posteriori probabilities of an utterance are calculated as a weighted sum of the posteriors corresponding to the individual models. The phonemes used by the systemÕs word pronunciations are grouped into classes according to amount of distortion they undergo in coding. The acoustic model used in the decoding process is a weighted combination of models derived from clean speech and models derived from speech that had been degraded by GSM coding (the source models), with the relative combination of the two sources depending on the extent to which each class of phonemes is degraded by the coding process. To determine the distortion class membership, and hence the weights, we measure the spectral distortion introduced to the quantized long-term residual by the RPE-LTP codec. We discuss how this distortion varies according to phonetic class. The method described reduces the degradation in recognition accuracy introduced by GSM coding of sentences in the TIMIT database by more than 70% relative to the baseline accuracy obtained in matched training and testing conditions with respect to a system using the source acoustic models, and up to 60% relative to the best baseline systems regardless of the number of Gaussians. Ó