The Speaker and Language Recognition Workshop (Odyssey 2016) 2016
DOI: 10.21437/odyssey.2016-32
|View full text |Cite
|
Sign up to set email alerts
|

Channel Compensation for Speaker Recognition using MAP Adapted PLDA and Denoising DNNs

Abstract: Over several decades, speaker recognition performance has steadily improved for applications using telephone speech. A big part of this improvement has been the availability of large quantities of speaker-labeled data from telephone recordings. For new data applications, such as audio from room microphones, we would like to effectively use existing telephone data to build systems with high accuracy while maintaining good performance on existing telephone tasks. In this paper we compare and combine approaches t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…Both theoretical derivation and experiments conducted on the NIST SRE10 core condition demonstrate that: 1) the parameterization of JB enables it to learn the intrinsic dimensionality of the identify subspace, which can reduce the system complexity without performance degradation; 2) Hidden variables selection of JB makes EM iterations converge faster with better parameter estimation; 3) The EM with exact statistics performs better than with approximated statistics. For future work, it is interesting to apply data domain adaption [5] and feature compensation [14,15] and nearestneighbor discriminant analysis (NDA) [16][17] that have been successfully applied to PLDA to JB to further improve performance.…”
Section: Discussionmentioning
confidence: 99%
“…Both theoretical derivation and experiments conducted on the NIST SRE10 core condition demonstrate that: 1) the parameterization of JB enables it to learn the intrinsic dimensionality of the identify subspace, which can reduce the system complexity without performance degradation; 2) Hidden variables selection of JB makes EM iterations converge faster with better parameter estimation; 3) The EM with exact statistics performs better than with approximated statistics. For future work, it is interesting to apply data domain adaption [5] and feature compensation [14,15] and nearestneighbor discriminant analysis (NDA) [16][17] that have been successfully applied to PLDA to JB to further improve performance.…”
Section: Discussionmentioning
confidence: 99%
“…The input to the DNN is a 21 frame window using a set of stacked 40-dimensional MFCC (as in the Section 3.3.1.2) and the target was the 40 dimensional feature vector at the center of the input window extracted from the clean data. The general setup for this system is very close to that described in [12]. A 7x1024 layer senone classifying DNN was trained on 300 hours of SWB with ~8K senone targets.…”
Section: Mit-ll Denoising Stats I-vector Systemmentioning
confidence: 99%
“…It was recently found that clean i-vectors can be partly restored by translating and rotating the noisy i-vectors, where the translation and rotation matrices are found by the Kabsch algorithm [28]. Instead of denosing the i-vectors, spectral features can also be denoised by DNNs [29] and denoising autoencoder [30] before i-vector extraction.…”
Section: Related Workmentioning
confidence: 99%