A spectral clustering approach to speaker diarization

Ning, Hui; Liu, Ming; Tang, Hao; Huang, Thomas S.

doi:10.21437/interspeech.2006-566

Cited by 35 publications

(7 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The most popular method for text-dependent speaker verification [9,10,11] and text-independent speaker verification [7,8,5,6] is discriminant analysis (PLDA) [5,6]. For text-independent speaker detection, hybrid techniques with deep learning-based components have also shown promise [12,13,14].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A Methodology for Speaker Diazaration System Based on LSTM and MFCC Coefficients

Indu D.

2024

jes

View full text Add to dashboard Cite

Research on Speaker Identification is always difficult. A speaker may be automatically identified using by comparing their voice sample with their previously recorded voice, the machine learning strategy has grown in favor in recent years. Convolutional neural networks (CNN) , deep neural networks (DNN) are some of the machine learning techniques that has employed recently. The article will discuss a successful speaker verification system based on the d-vector to construct a new approach based on speaker diarization. In particular, in this article, we use the concept of LSTM to cluster the speech segments using MFCC coefficients and identify the speakers in the diarization system. The proposed system will be evaluated using benchmark performance metrics, and a comparative study will be made with other models. The need to consider the LSTM neural network using acoustic data and linguistic dialect is considered. LSTM networks could produce reliable speaker segmentation outputs.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Neural networks have been used in speaker diarization systems rapidly in recent technological inventories [7], [8], [9], and [10]. In most literature, speaker diarization systems use text-based speaker verification and detection to identify the same speaker.…”

Section: Introductionmentioning

confidence: 99%

A Methodology for Speaker Diazaration System Based on LSTM and MFCC Coefficients

Indu D.

2024

jes

View full text Add to dashboard Cite

show abstract

“…After model training, a temporal continuity of similarity scores can be incorporated [15]. This is done by multiplying the similarity score s(i, j) with an exponential decay given by, s (i, j) = s(i, j)β min(n b ,|i−j|) (10) where, β is a positive decay factor < 1, |i−j| is the absolute segment index difference value of embeddings from the ith and jth segment, and n b is the maximum value of the decay constant β.…”

Section: Choice Of Hyper Parametersmentioning

confidence: 99%

“…The inputs to the clustering algorithms commonly employ pre-processing techniques on the embeddings like length normalization [8], principal component analysis (PCA) [9] and PLDA based affinity matrix computation [4]. Another common approach to clustering is the spectral clustering approach [10]. In most of these approaches, the affinity matrix computation and clustering are performed as two independent steps with different cost functions.…”

Section: Introductionmentioning

confidence: 99%

Self-Supervised Metric Learning With Graph Clustering For Speaker Diarization

Singh

Ganapathy

2021

2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

In this paper, we propose a novel algorithm for speaker diarization using metric learning for graph based clustering. The graph clustering algorithms use an adjacency matrix consisting of similarity scores. These scores are computed between speaker embeddings extracted from pairs of audio segments within the given recording. In this paper, we propose an approach that jointly learns the speaker embeddings and the similarity metric using principles of self-supervised learning. The metric learning network implements a neural model of the probabilistic linear discriminant analysis (PLDA). The self-supervision is derived from the pseudo labels obtained from a previous iteration of clustering. The entire model of representation learning and metric learning is trained with a binary cross entropy loss. By combining the self-supervision based metric learning along with the graph-based clustering algorithm, we achieve significant relative improvements of 60% and 7% over the x-vector PLDA agglomerative hierarchical clustering (AHC) approach on AMI and the DIHARD datasets respectively in terms of diarization error rates (DER).

show abstract

“…In contrast, spectral clustering (SC; Luxburg, 2007 ) does not require a statistical metric to determine whether two clusters should be merged. Previous researches have applied SC to infer speaker clusters and achieved good performance (Iso, 2010 ; Ning et al, 2010 ), especially in speaker diarization tasks (Ning et al, 2006 ).…”

Section: Introductionmentioning

confidence: 99%

Speaker-turn aware diarization for speech-based cognitive assessments

Xu,

Ke,

Mak

et al. 2024

Front. Neurosci.

View full text Add to dashboard Cite

IntroductionSpeaker diarization is an essential preprocessing step for diagnosing cognitive impairments from speech-based Montreal cognitive assessments (MoCA).MethodsThis paper proposes three enhancements to the conventional speaker diarization methods for such assessments. The enhancements tackle the challenges of diarizing MoCA recordings on two fronts. First, multi-scale channel interdependence speaker embedding is used as the front-end speaker representation for overcoming the acoustic mismatch caused by far-field microphones. Specifically, a squeeze-and-excitation (SE) unit and channel-dependent attention are added to Res2Net blocks for multi-scale feature aggregation. Second, a sequence comparison approach with a holistic view of the whole conversation is applied to measure the similarity of short speech segments in the conversation, which results in a speaker-turn aware scoring matrix for the subsequent clustering step. Third, to further enhance the diarization performance, we propose incorporating a pairwise similarity measure so that the speaker-turn aware scoring matrix contains both local and global information across the segments.ResultsEvaluations on an interactive MoCA dataset show that the proposed enhancements lead to a diarization system that outperforms the conventional x-vector/PLDA systems under language-, age-, and microphone-mismatch scenarios.DiscussionThe results also show that the proposed enhancements can help hypothesize the speaker-turn timestamps, making the diarization method amendable to datasets without timestamp information.

show abstract

A spectral clustering approach to speaker diarization

Cited by 35 publications

References 2 publications

A Methodology for Speaker Diazaration System Based on LSTM and MFCC Coefficients

A Methodology for Speaker Diazaration System Based on LSTM and MFCC Coefficients

Self-Supervised Metric Learning With Graph Clustering For Speaker Diarization

Speaker-turn aware diarization for speech-based cognitive assessments

Contact Info

Product

Resources

About