2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2018
DOI: 10.23919/apsipa.2018.8659790
|View full text |Cite
|
Sign up to set email alerts
|

Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification

Abstract: Frame alignments can be computed by different methods in GMM-based speaker verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are able to compare the performance using alignments extracted from the deep neural networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted speaker verification. Based on the different characteristics of these two alignments, we present a novel content verification method to improve the system security without much computational overhead.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 23 publications
(44 reference statements)
0
2
0
Order By: Relevance
“…Most of machine learning is shallow model (such as GMM, HMM) [23][24] [25], which means weak nonlinear transformation ability, so it is not enough to describe the complex highdimensional features of speech. Since the input of GMM is a single frame, the influence of copronunciation is ignored, so we use the spliced frame as the input of the neural network to model the observation probability.…”
Section: Backbonementioning
confidence: 99%
“…Most of machine learning is shallow model (such as GMM, HMM) [23][24] [25], which means weak nonlinear transformation ability, so it is not enough to describe the complex highdimensional features of speech. Since the input of GMM is a single frame, the influence of copronunciation is ignored, so we use the spliced frame as the input of the neural network to model the observation probability.…”
Section: Backbonementioning
confidence: 99%
“…Textdependent SV task allows us to compare utterances of the same phonetic context [6], [7], or random word sequences coming from a fixed vocabulary [8], [9]. With random sequences, such as random digit strings, an SV system is less vulnerable to replay attacks [8], [10], [11]. In this work, we study a neural acoustic-phonetic approach for SV of random digit strings in RSR2015 Part III database [12].…”
Section: Introductionmentioning
confidence: 99%