2006 IEEE International Conference on Multimedia and Expo 2006
DOI: 10.1109/icme.2006.262727
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Speaker Segmentation using Multiple Features and Distance Measures: A Comparison of Three Approaches

Abstract: This paper addresses the problem of unsupervised speaker change detection. Three systems based on the Bayesian Information Criterion (BIC) are tested. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic thresholding followed by a fusion scheme, and finally applies BIC. The second method is a real-time one that uses a metric-based approach employing the line spectral pairs and the BIC to validate a potential speaker change point. The third method … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2007
2007
2018
2018

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 12 publications
(25 citation statements)
references
References 11 publications
0
25
0
Order By: Relevance
“…RCL and M DR are also improved with respect to the two remaining systems. Finally, the superiority of the proposed system against the three systems developed in [21] is demonstrated by the fact that its F 1 value is relatively improved by 7.917%, 6.438%, and 28.007%, respectively. In [12], the used dataset was created by concatenating speaker utterances from the TIMIT database, too.…”
Section: ) Performance Discussionmentioning
confidence: 97%
See 1 more Smart Citation
“…RCL and M DR are also improved with respect to the two remaining systems. Finally, the superiority of the proposed system against the three systems developed in [21] is demonstrated by the fact that its F 1 value is relatively improved by 7.917%, 6.438%, and 28.007%, respectively. In [12], the used dataset was created by concatenating speaker utterances from the TIMIT database, too.…”
Section: ) Performance Discussionmentioning
confidence: 97%
“…It outperforms three other systems tested on a similar dataset, created by concatenating speakers from the TIMIT database, as described in [21]. Although the dataset in [21] is substantially smaller than the conTIMIT test dataset, the nature of the audio recordings is the same enabling us to conduct fair comparisons. The performance achieved March 22, 2008 DRAFT by the previous approaches is summarized in Table XII.…”
Section: ) Performance Discussionmentioning
confidence: 98%
“…Features like the smoothed zerocrossing rate (SZCR), the perceptual minimum variance distortionless response (PMVDR), and the filterbank log-coefficients (FBLCs) are introduced in [53]. Additional features are derived from MPEG-7 audio standard such as AudioSpectrumCentroid, AudioWaveformEnvelope [7,8], AudioSpectrumEnvelope, and AudioSpectrumProjection [5,6].…”
Section: Feature Extractionmentioning
confidence: 99%
“…The term modified power spectrum coefficients means that the power spectrum coefficients corresponding to frequencies below 62.5 Hz are replaced by a single coefficient equal to their sum [7,8,52].…”
Section: Audiospectrumcentroid (Asc)mentioning
confidence: 99%
See 1 more Smart Citation