2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 2016
DOI: 10.1109/apsipa.2016.7820903
|View full text |Cite
|
Sign up to set email alerts
|

System combination for short utterance speaker recognition

Abstract: For text-independent short-utterance speaker recognition (SUSR), the performance often degrades dramatically. This paper presents a combination approach to the SUSR tasks with two phonetic-aware systems: one is the DNN-based ivector system and the other is our recently proposed subregionbased GMM-UBM system. The former employs phone posteriors to construct an i-vector model in which the shared statistics offers stronger robustness against limited test data, while the latter establishes a phone-dependent GMM-UB… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 13 publications
1
4
0
Order By: Relevance
“…The DNN-based i-vector system significantly exceeds its relative baseline. This confirms the effectiveness of recognition methods [33]. In addition, it can be seen that the GMM-UBM baseline is superior to the two i-vector systems, but after using probabilistic linear discriminant analysis (PLDA) [34], the i-vector system is improved and outperforms the GMM-UBM system.…”
Section: Classifier System Deep Neural Networksupporting
confidence: 63%
“…The DNN-based i-vector system significantly exceeds its relative baseline. This confirms the effectiveness of recognition methods [33]. In addition, it can be seen that the GMM-UBM baseline is superior to the two i-vector systems, but after using probabilistic linear discriminant analysis (PLDA) [34], the i-vector system is improved and outperforms the GMM-UBM system.…”
Section: Classifier System Deep Neural Networksupporting
confidence: 63%
“…However, there are many cases where multiple samples are not available for comparison (only a single voice recording fragment is available). Several studies have been conducted using short utterances [17–20], but these generally do not meet the requirements of forensic evaluation: the evaluation datasets do not follow a strict protocol [21] or use techniques that have already been outperformed by deep learning techniques in regular speaker recognition. This study aims to investigate this scenario in two ways: only one sample is available for the unknown speaker and (i) only one or (ii) multiple samples are available for the known speaker.…”
Section: Introductionmentioning
confidence: 99%
“…Studies have shown that in very short‐duration cases, the classical GMM‐UBM‐based approach worked better with respect to the modern i ‐vector‐based approach [40]. Fusion of multiple classifiers yielded considerable improvements over the standalone approaches [104]. Research in short‐utterance problem in ASV has seen efforts to accommodate phonetic distribution for speaker modelling [57, 97].…”
Section: Discussionmentioning
confidence: 99%
“…Studies have shown that in very short‐duration cases, the classical GMM‐UBM‐based approach worked better with respect to the modern i ‐vector‐based approach [40]. Fusion of multiple classifiers yielded considerable improvements over the standalone approaches [104].…”
Section: Discussionmentioning
confidence: 99%