Automatic Speaker Segmentation using Multiple Features and Distance Measures: A Comparison of Three Approaches

Kotti, Margarita; Martins, Luís Gustavo; Benetos, Emmanouil; Cardoso, Jaime S.; Kotropoulos, Constantine

doi:10.1109/icme.2006.262727

Cited by 12 publications

(25 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RCL and M DR are also improved with respect to the two remaining systems. Finally, the superiority of the proposed system against the three systems developed in [21] is demonstrated by the fact that its F 1 value is relatively improved by 7.917%, 6.438%, and 28.007%, respectively. In [12], the used dataset was created by concatenating speaker utterances from the TIMIT database, too.…”

Section: ) Performance Discussionmentioning

confidence: 97%

“…It outperforms three other systems tested on a similar dataset, created by concatenating speakers from the TIMIT database, as described in [21]. Although the dataset in [21] is substantially smaller than the conTIMIT test dataset, the nature of the audio recordings is the same enabling us to conduct fair comparisons. The performance achieved March 22, 2008 DRAFT by the previous approaches is summarized in Table XII.…”

Section: ) Performance Discussionmentioning

confidence: 98%

See 1 more Smart Citation

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Kotti

Benetos

Kotropoulos

2008

IEEE Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

An algorithm for automatic speaker segmentation based on the Bayesian Information Criterion (BIC) is presented. BIC tests are not performed for every window shift (e.g. every milliseconds), as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches.

show abstract

Section: ) Performance Discussionmentioning

confidence: 97%

Section: ) Performance Discussionmentioning

confidence: 98%

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Kotti

Benetos

Kotropoulos

2008

IEEE Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Features like the smoothed zerocrossing rate (SZCR), the perceptual minimum variance distortionless response (PMVDR), and the filterbank log-coefficients (FBLCs) are introduced in [53]. Additional features are derived from MPEG-7 audio standard such as AudioSpectrumCentroid, AudioWaveformEnvelope [7,8], AudioSpectrumEnvelope, and AudioSpectrumProjection [5,6].…”

Section: Feature Extractionmentioning

confidence: 99%

“…The term modified power spectrum coefficients means that the power spectrum coefficients corresponding to frequencies below 62.5 Hz are replaced by a single coefficient equal to their sum [7,8,52].…”

Section: Audiospectrumcentroid (Asc)mentioning

confidence: 99%

“…The MPEG-7 standard developed by the Moving Picture Experts Group can be used to describe efficiently a speech recording [3,4]. For example, MPEG-7 lowlevel audio feature descriptors such as AudioSpectrumProjection, AudioSpectrumEnvelope [5,6], AudioSpectrumCentroid, AudioWaveformEnvelope [7,8] can be used. MPEG-7 high-level tools, such as SpokenContent, that exploit speakers' word usage or prosodic features, could also be exploited.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Speaker segmentation and clustering

2008

Self Cite

View full text Add to dashboard Cite

This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering.

show abstract

Incomplete-Data-Driven Speaker Segmentation for Diarization Application; A Help-Training Approach

Teimoori

Razzazi

2018

Circuits Syst Signal Process

View full text Add to dashboard Cite

Automatic Speaker Segmentation using Multiple Features and Distance Measures: A Comparison of Three Approaches

Cited by 12 publications

References 11 publications

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Speaker segmentation and clustering

Incomplete-Data-Driven Speaker Segmentation for Diarization Application; A Help-Training Approach

Contact Info

Product

Resources

About