Large-scale speaker identification

Schmidt, Ludwig; Sharifi, Matthew; Moreno, Ignacio López

doi:10.1109/icassp.2014.6853878

Cited by 31 publications

(27 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other work by Schmidt et al [4] uses Local Sensitive Hashing (LSH) and fast nearest neighbor search algorithm for speaker indexing. Schmidt proposed an indexing method using i- Vector.…”

Section: Introductionmentioning

confidence: 99%

A New Strategy of Direct Access for Speaker Identification System Based on Classification

2015

View full text Add to dashboard Cite

show abstract

“…Other work by Schmidt et al [4] uses Local Sensitive Hashing (LSH) and fast nearest neighbor search algorithm for speaker indexing. Schmidt proposed an indexing method using i- Vector.…”

Section: Introductionmentioning

confidence: 99%

A New Strategy of Direct Access for Speaker Identification System Based on Classification

2015

View full text Add to dashboard Cite

show abstract

“…This section summarises the current work on I-vector and GMM-UBM approaches and other related work, alongside our previous work and other state of the art methods [14], [22], [12], [13], [23], [24], and [5]. According to Table IV, the handset used was G.712 type at 16 kHz, and all proposed noise measurements in this table were at SNR 30 dB and mixture size 256.…”

Section: Related Workmentioning

confidence: 99%

“…Nevertheless, this study lacked a large number of speakers, as only 50 self collected speakers were used. In [13], 1,000 speakers were selected from YouTube to construct an I-vector speaker identification framework, but this non-standard database did not include noisy conditions.…”

Section: Introductionmentioning

confidence: 99%

Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments

Al-Kaltakchi

Woo

Dlay

et al. 2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Abstract-In this paper, two models, the I-vector and the Gaussian Mixture Model-Universal Background Model (GMM-UBM), are compared for the speaker identification task. Four feature combinations of I-vectors with seven fusion techniques are considered: maximum, mean, weighted sum, cumulative, interleaving and concatenated for both two and four features. In addition, an Extreme Learning Machine (ELM) is exploited to identify speakers, and then Speaker Identification Accuracy (SIA) is calculated. Both systems are evaluated for 120 speakers from the TIMIT and NIST 2008 databases for clean speech. Furthermore, a comprehensive evaluation is made under Additive White Gaussian Noise (AWGN) conditions and with three types of Non Stationary Noise (NSN), both with and without handset effects for the TIMIT database. The results show that the Ivector approach is better than the GMM-UBM for both clean and AWGN conditions without a handset. However, the GMM-UBM had better accuracy for NSN types.

show abstract

“…where, both i and j take values 1 and 2, therefore f weight ij takes one of four values f weight 11 , f weight 12 , f weight 13 , and f weight 22 , and f weight 11 is the linear combination of f 1 and g 1 , likewise f weight 12 is the linear combination of f 1 and g 2 and so on. For each f weight ij , ω β can take on one of four values, namely, ω β ∈ {0.9, 0.8, 0.77, 0.7} which is chosen to give empirically the best SIA.…”

Section: Fusion Strategiesmentioning

confidence: 99%

“…However, the identification rate using the NIST 2003 database was poor. In [13], approximately 1000 speakers were selected and recordings were made, including in an acoustics room, with noise, and with varying microphone distance. However, the conditions were perhaps unfair and a non-standard database (derived from YouTube) was used.…”

Section: Introductionmentioning

confidence: 99%

Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

Al-Kaltakchi

Woo

Dlay

et al. 2017

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

In this study, a speaker identification system is considered consisting of a feature extraction stage which utilizes both power normalized cepstral coefficients (PNCCs) and Mel frequency cepstral coefficients (MFCC). Normalization is applied by employing cepstral mean and variance normalization (CMVN) and feature warping (FW), together with acoustic modeling using a Gaussian mixture model-universal background model (GMM-UBM). The main contributions are comprehensive evaluations of the effect of both additive white Gaussian noise (AWGN) and non-stationary noise (NSN) (with and without a G.712 type handset) upon identification performance. In particular, three NSN types with varying signal to noise ratios (SNRs) were tested corresponding to street traffic, a bus interior, and a crowded talking environment. The performance evaluation also considered the effect of late fusion techniques based on score fusion, namely, mean, maximum, and linear weighted sum fusion. The databases employed were TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3600 speech utterances. As recommendations from the study, mean fusion is found to yield overall best performance in terms of speaker identification accuracy (SIA) with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings.

show abstract

Large-scale speaker identification

Cited by 31 publications

References 18 publications

A New Strategy of Direct Access for Speaker Identification System Based on Classification

A New Strategy of Direct Access for Speaker Identification System Based on Classification

Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments

Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

Contact Info

Product

Resources

About