Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1903
|View full text |Cite
|
Sign up to set email alerts
|

The Second DIHARD Challenge: System Description for USC-SAIL Team

Abstract: In this paper, we describe components that form a part of USC-SAIL team's submissions to Track 1 and Track 2 of the second DIHARD speaker diarization challenge. We describe each module in our speaker diarization pipeline and explain the rationale behind our choice of algorithms for each module, while comparing the Diarization Error Rate (DER) against different module combinations. We propose a clustering scheme based on spectral clustering that yields competitive performance. Moreover, we introduce an overlap … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
1
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(1 citation statement)
references
References 14 publications
0
1
0
Order By: Relevance
“…In terms of clustering, most of the existing algorithms that have been used in speaker diarization are unsupervised. Among them, agglomerative hierarchical clustering (AHC) [3] and spectral clustering (SC) [16] using pairwise embedding similarity measurement techniques like cosine distance [6], [9], PLDA [17] and using an LSTM [10] are the most popular. Similarly, other unsupervised clustering methods such as Gaussian mixture model [4], [13], mean-shift [5], k-means [18], and links [19] have also been adopted for speaker diarization.…”
mentioning
confidence: 99%
“…In terms of clustering, most of the existing algorithms that have been used in speaker diarization are unsupervised. Among them, agglomerative hierarchical clustering (AHC) [3] and spectral clustering (SC) [16] using pairwise embedding similarity measurement techniques like cosine distance [6], [9], PLDA [17] and using an LSTM [10] are the most popular. Similarly, other unsupervised clustering methods such as Gaussian mixture model [4], [13], mean-shift [5], k-means [18], and links [19] have also been adopted for speaker diarization.…”
mentioning
confidence: 99%