Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
DOI: 10.1109/icassp.2005.1415077
|View full text |Cite
|
Sign up to set email alerts
|

Improved Phonetic Speaker Recognition Using Lattice Decoding

Abstract: The current "state-of-the-art" in phonetic speaker recognition uses relative frequencies of phone n-grams as features for training speaker models and for scoring test-target pairs. Typically, these relative frequencies are computed from a simple 1-best phone decoding of the input speech. In this paper, we present results on the Switchboard-2 corpus, where we compare 1-best phone decodings versus lattice phone decodings for the purposes of performing phonetic speaker recognition. The phone decodings are used to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(29 citation statements)
references
References 6 publications
(16 reference statements)
0
29
0
Order By: Relevance
“…Our use of expected counts differs from Saraclar and Sproat [2004] in that we estimate probability models from the expected counts. Conceptually, our method of estimating language models from expected term frequencies is close to that of Hatch et al [2005] and that of . In practice, however, our method differs from both works in a number of ways.…”
Section: Contributions Of Our Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Our use of expected counts differs from Saraclar and Sproat [2004] in that we estimate probability models from the expected counts. Conceptually, our method of estimating language models from expected term frequencies is close to that of Hatch et al [2005] and that of . In practice, however, our method differs from both works in a number of ways.…”
Section: Contributions Of Our Workmentioning
confidence: 99%
“…In practice, however, our method differs from both works in a number of ways. While Hatch et al [2005] derive phone bigram statistics for representing phonotactics, we derive word statistics for representing semantics. In addition, while estimate the probability of unseen phone ngrams using lower order n-gram statistics, this smoothing approach is inapplicable in our case, as we derive word models where the model vocabulary is large and where the sparse data problem is of a different nature.…”
Section: Contributions Of Our Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The strongest motivation for it comes from two related properties: any feature distribution, to the extent that it matches the background distribution, is warped to a uniform distribution over the interval 0; 1]. Conversely, the kernel-induced distance between datapoints D(x; y) 2 = K(x; x) + K(y; y) 2K(x;y) = jjx yjj 2 (6) (in the case of a linear kernel K(x; y)) is such that along any single feature dimension, two points x and y are separated by a distance proportional to the number of background data samples falling between x and y. In other words, the normalization stretches the feature space in areas of high population density and shrinks it in areas of low density.…”
Section: Rank Normalizationmentioning
confidence: 99%
“…Here we use the phone recognition-based modeling paradigm of [3] with the lattice-based refinement of [6]. An English open-loop phone recognizer is run on each conversation side, generating lattices.…”
Section: Phone N-gram Featuresmentioning
confidence: 99%