2008
DOI: 10.1109/icassp.2008.4518622
|View full text |Cite
|
Sign up to set email alerts
|

Speaker diarization of French broadcast news

Abstract: We report results on speaker diarization of French broadcast news and talk shows on current affairs. This speaker diarization process is a multistage segmentation and clustering system. One of the stages is agglomerative clustering using state-of-the-art speaker identification methods (SID). For the GMMs used in this stage, we tried many different feature parameters, including MFCCs, Gaussianized MFCCs, Gaussianized MFCCs with cepstral mean subtraction, and Gaussianized MFCCs with cepstral mean substraction co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 19 publications
(13 citation statements)
references
References 6 publications
0
13
0
Order By: Relevance
“…Finally, the scaling factor r was set equal to 0.30 and 0.17 for the DGA and ELDA set, respectively. We compare the results of the proposed algorithm to those obtain by CRIM's primary system, described in [14]. Like the proposed method, the system uses the AHC algorithm to merge segments.…”
Section: Resultsmentioning
confidence: 99%
“…Finally, the scaling factor r was set equal to 0.30 and 0.17 for the DGA and ELDA set, respectively. We compare the results of the proposed algorithm to those obtain by CRIM's primary system, described in [14]. Like the proposed method, the system uses the AHC algorithm to merge segments.…”
Section: Resultsmentioning
confidence: 99%
“…In full batch processing, we normalize the features of each speaker in a room to zero mean and compute a 100-dimensional i-vector from this speaker in the room. In order to assign utterances in a room to speakers, we carry out speaker diarization using a modified version of the multi-stage segmentation and clustering system [42] as described before.…”
Section: Results Obtained With Full Batch Processingmentioning
confidence: 99%
“…7 Architecture of seven-layer DNN used with TRAP and i-vector features version of the multi-stage segmentation and clustering system [42]. The modification is that each utterance corresponds to one speaker.…”
Section: Algorithm Used For Decodingmentioning
confidence: 99%
“…The most commonly used are the Gaussian mixture models and the hidden Markov models. 10,11,14,26,37,40 Also widely used are the support vector machines, 11,14,38,39,41 the artificial neural networks, 10 the k-nearest neighbor algorithm, 14,38 the decision trees, 10,38 the genetic algorithms, 2 the fuzzy logic 42 and boosting techniques. 41,43 Related architectures incorporate fusion frameworks among recognition models 28,44 and combination of model-based and distance based algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…41,43 Related architectures incorporate fusion frameworks among recognition models 28,44 and combination of model-based and distance based algorithms. 13,26,27,39,40 Postprocessing schemes can improve the overall recognition accuracy. Among the postprocessing schemes are (i) transformation of the feature matrix, 23,[44][45][46] (ii) correction of logical errors based on empirical rules, 11 (iii) isolation of the segments of interest in cases where the post-processing is focused on specific classes 10,11,13,38,40,47 and (iv) merging of sound events and separation of them in a post-processing stage.…”
Section: Introductionmentioning
confidence: 99%