2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).
DOI: 10.1109/icassp.2003.1202765
|View full text |Cite
|
Sign up to set email alerts
|

Conditional pronunciation modeling in speaker detection

Abstract: In this paper, we present a conditional pronunciation niodeling method for the speaker detection task that does not rely on acoustic vectors. Aiming at exploiting higherlevel information carried by the speech signal, it uses timealigned streams of phones and phonemes to model a speaker's specific Pronunciation. Our system uses phonemes drawn from a lexicon of pronunciations of words recognized by an automatic speech recognition system to generate the phoneme stream and an open-loop phone recognizer to generate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
22
0

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(23 citation statements)
references
References 8 publications
0
22
0
Order By: Relevance
“…The results show that there is significant benefit of fusing high-and low-level features for speaker verification. Among the high-level features investigated, the conditional pronunciation modeling (CPM) technique [30] that extracts multilingual phone sequences from utterances achieves the best performance [20]. One limitation of the CPM in [30] is that it requires multi-lingual corpora to build speaker and background models.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…The results show that there is significant benefit of fusing high-and low-level features for speaker verification. Among the high-level features investigated, the conditional pronunciation modeling (CPM) technique [30] that extracts multilingual phone sequences from utterances achieves the best performance [20]. One limitation of the CPM in [30] is that it requires multi-lingual corpora to build speaker and background models.…”
Section: Introductionmentioning
confidence: 99%
“…Among the high-level features investigated, the conditional pronunciation modeling (CPM) technique [30] that extracts multilingual phone sequences from utterances achieves the best performance [20]. One limitation of the CPM in [30] is that it requires multi-lingual corpora to build speaker and background models. To overcome this limitation, Leung et al [32] proposed using articulatory feature (AF) streams to construct CPM and called the resulting models AFCPM.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Textindependent speaker verification systems typically extract speaker features from short-term spectra of speech signals to build speaker-dependent Gaussian mixture models (GMMs) [1]. Studies have shown that combining low-level acoustic information with high-level speaker information-such as the usage or duration of particular words, prosodic features and articulatory features (AF)-can improve speaker verification performance [2][3][4][5][6].…”
Section: Introductionmentioning
confidence: 99%
“…This line of research, which is generally referred to as phonetic speaker recognition, was pioneered by Andrews et al, who used relative frequencies of phone n-grams to capture sequential patterns in an individual's speech [1,2]. This work was subsequently extended in various papers, such as the work of the "SuperSID" team at the JHU 2002 Summer Workshop [5,6,7]. In 2003, Campbell et al used support vector machines (SVMs) to train phonetic speaker models [3].…”
Section: Introductionmentioning
confidence: 99%