2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 2007
DOI: 10.1109/icassp.2007.366902
|View full text |Cite
|
Sign up to set email alerts
|

Model Complexity Selection and Cross-Validation EM Training for Robust Speaker Diarization

Abstract: Accurate modeling of speaker clusters is important in the task of speaker diarization. Creating accurate models involves both selection of the model complexity and optimum training given the data. Using models with fixed complexity and trained using the standard EM algorithm poses a risk of overfitting, which can lead to a reduction in diarization performance. In this paper a technique proposed by the author to estimate the complexity of a model is combined with a novel training algorithm called "Cross-Validat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2007
2007
2017
2017

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…Table 5 These results suggest that the CL level has a lower influence on the phone segments than on the pause segments. [31] using the Pa-2010 feature set. Table 6 gives the UAR on the Development set of the C2 and C2N classifiers for the three CLSE databases.…”
Section: Audio Analysis Of Pause and Phone Segmentsmentioning
confidence: 99%
“…Table 5 These results suggest that the CL level has a lower influence on the phone segments than on the pause segments. [31] using the Pa-2010 feature set. Table 6 gives the UAR on the Development set of the C2 and C2N classifiers for the three CLSE databases.…”
Section: Audio Analysis Of Pause and Phone Segmentsmentioning
confidence: 99%
“…Here, for speaker change detection αδ is regarded as the threshold. The main goal of the validation approach [11][12] is verification of speaker change point detected in Step 3. From the graph, P and Q are two peak points of the correlation coefficient curve between U, V, and W that are 3 adjoining valleys in Fig.…”
Section: B Segmentation Algorithmmentioning
confidence: 99%
“…Each state of an acoustic model contains several Gaussian mixtures. According to [14], a parametric modeling technique, model complexity selection, is chosen to perform on each state with sufficient training data, and we select the number of mixtures based on the number of data frames belonging to the phoneme state in the acoustic model. Model complexity selection works as follows: whenever there is a change in the amount of data assigned to a model, the number of the available training samples that are assigned to the model is used to determine the new number of mixtures in the GMM using: …”
Section: Model Complexity Selection (Mcs)mentioning
confidence: 99%