2003
DOI: 10.1109/tsa.2003.809121
|View full text |Cite
|
Sign up to set email alerts
|

Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition

Abstract: We present a transformation based, multigrained data modeling technique in the context of text independent speaker recognition, aimed at mitigating di culties caused by sparse training and test data. Both identication and veri cation are addressed, where we view the entire population as divided into the target population and its complement, which we refer to as the background population. First, we present our development of maximum likelihood transformation based recognition with diagonally constrained Gaussia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2005
2005
2017
2017

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…This phonetic mismatch problem has been attacked with phonetically-motivated tree structures [44,92] and by using a separate GMM for each phonetic class [40,61,81,180] or for parts of syllables [25]. As an example, phonetic GMM (PGMM) described in [40] used neural network classifier for 11 language independent broad phone classes.…”
Section: Gaussian Mixture Modelmentioning
confidence: 99%
“…This phonetic mismatch problem has been attacked with phonetically-motivated tree structures [44,92] and by using a separate GMM for each phonetic class [40,61,81,180] or for parts of syllables [25]. As an example, phonetic GMM (PGMM) described in [40] used neural network classifier for 11 language independent broad phone classes.…”
Section: Gaussian Mixture Modelmentioning
confidence: 99%
“…In an authentication situation, an Researchers, who've extensively studied biometric person recognition for more than 20 years, have developed technologies with varying degrees of success. [1][2][3] Most promising systems, despite performing well in controlled environments, suffered significantly when deployed in challenging environments such as an airplane cockpit or a moving vehicle. (A note about terminology: We use the term speaker recognition if the modality is only speech or an audio signal, otherwise we use the term person recognition.…”
Section: The Person Recognition Problemmentioning
confidence: 99%
“…But, at present, most speaker recognition systems use only audio data. 2 Under noisy conditions, of course, such systems are far from perfect for high-security applications, an observation that's equally valid for systems using only visual data. Poor picture quality, changes in pose and lighting conditions, inclement weather conditions, or varying facial expressions may significantly degrade person recognition performance.…”
Section: The Person Recognition Problemmentioning
confidence: 99%
“…The series of indistinguishable units that add up to make a sequence of words and hence a variety of languages (according to the manner and context of utterances) are termed as phonemes. The pronunciation of phonemes depends upon contextual effects, speaker characteristics and emotions [3]. Human speech is dynamic rather than static, since the articulators keep moving during articulation this fact leads to an assumption that we begin to articulate the next segment before completing the previous one that is events are all set before they occur [4].…”
Section: Introductionmentioning
confidence: 99%