Improved Phonetic Speaker Recognition Using Lattice Decoding

Hatch, Andrew; Peskin, Barbara; Stolcke, Andreas

doi:10.1109/icassp.2005.1415077

Cited by 31 publications

(29 citation statements)

References 6 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our use of expected counts differs from Saraclar and Sproat [2004] in that we estimate probability models from the expected counts. Conceptually, our method of estimating language models from expected term frequencies is close to that of Hatch et al [2005] and that of . In practice, however, our method differs from both works in a number of ways.…”

Section: Contributions Of Our Workmentioning

confidence: 99%

“…In practice, however, our method differs from both works in a number of ways. While Hatch et al [2005] derive phone bigram statistics for representing phonotactics, we derive word statistics for representing semantics. In addition, while estimate the probability of unseen phone ngrams using lower order n-gram statistics, this smoothing approach is inapplicable in our case, as we derive word models where the model vocabulary is large and where the sparse data problem is of a different nature.…”

Section: Contributions Of Our Workmentioning

confidence: 99%

“…Expected counts have also been used to summarize the phonotactics of a speech recording represented in a lattice: Hatch et al [2005] performed speaker recognition by computing the expected counts of phone bigrams in a phone lattice, and estimating an unsmoothed probability distribution of phone bigrams. In addition, Allauzen et al [2003Allauzen et al [ , 2004a outlined algorithms based on automata theory for obtaining expected counts and estimating backoff and interpolated language models from such counts, for speech recognizer adaptation and speech mining.…”

Section: Expected Counts From Latticesmentioning

confidence: 99%

See 2 more Smart Citations

Statistical lattice-based spoken document retrieval

Chia

Sim

et al. 2010

ACM Trans. Inf. Syst.

View full text Add to dashboard Cite

Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1-best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. We present a method for lattice-based spoken document retrieval based on a statistical n-gram modeling approach to information retrieval. In this statistical lattice-based retrieval (SLBR) method, a smoothed statistical model is estimated for each document from the expected counts of words given the information in a lattice, and the relevance of each document to a query is measured as a probability under such a model. We investigate the efficacy of our method under various parameter settings of the speech recognition and lattice processing engines, using the Fisher English Corpus of conversational telephone speech. Experimental results show that our method consistently achieves better retrieval performance than using only the 1-best transcripts in statistical retrieval, outperforms a recently proposed lattice-based vector space retrieval method, and also compares favorably with a lattice-based retrieval method based on the Okapi BM25 model. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.

show abstract

Section: Contributions Of Our Workmentioning

confidence: 99%

Section: Contributions Of Our Workmentioning

confidence: 99%

Section: Expected Counts From Latticesmentioning

confidence: 99%

See 1 more Smart Citation

Statistical lattice-based spoken document retrieval

Chia

Sim

et al. 2010

ACM Trans. Inf. Syst.

View full text Add to dashboard Cite

show abstract

“…The strongest motivation for it comes from two related properties: any feature distribution, to the extent that it matches the background distribution, is warped to a uniform distribution over the interval 0; 1]. Conversely, the kernel-induced distance between datapoints D(x; y) 2 = K(x; x) + K(y; y) 2K(x;y) = jjx yjj 2 (6) (in the case of a linear kernel K(x; y)) is such that along any single feature dimension, two points x and y are separated by a distance proportional to the number of background data samples falling between x and y. In other words, the normalization stretches the feature space in areas of high population density and shrinks it in areas of low density.…”

Section: Rank Normalizationmentioning

confidence: 99%

“…Here we use the phone recognition-based modeling paradigm of [3] with the lattice-based refinement of [6]. An English open-loop phone recognizer is run on each conversation side, generating lattices.…”

Section: Phone N-gram Featuresmentioning

confidence: 99%

Nonparametric feature normalization for SVM-based speaker verification

Stolcke

Kajarekar

Ferrer

2008

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

We investigate several feature normalization and scaling approaches for use in speaker verification based on support vector machines. We are particularly interested in methods that are "knowledge-free" and work for a variety of features, leading us to investigate MLLR transforms, phone N-grams, prosodic sequences, and word N-gram features. Normalization methods studied include mean/variance normalization, TFLLR and TFLOG scaling, and a simple nonparametric approach: rank-normalization. We find that rank-normalization is uniformly competitive with other methods, and improves upon them in many cases.

show abstract

Voice Biometrics

González-Rodríguez¹,

Toledano²,

Ortega‐Garcia³

Handbook of Biometrics

View full text Add to dashboard Cite

Improved Phonetic Speaker Recognition Using Lattice Decoding

Cited by 31 publications

References 6 publications

Statistical lattice-based spoken document retrieval

Statistical lattice-based spoken document retrieval

Nonparametric feature normalization for SVM-based speaker verification

Voice Biometrics

Contact Info

Product

Resources

About