Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2813
|View full text |Cite
|
Sign up to set email alerts
|

Bayesian HMM Based x-Vector Clustering for Speaker Diarization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
54
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 60 publications
(54 citation statements)
references
References 10 publications
0
54
0
Order By: Relevance
“…An HMM is used to refine the clusters after an initial run of AHC. Previous approaches to using an HMM for diarisation have used the speaker embeddings as the observed variable [12,13]. This report proposes to incorporate the spatial location of the current active speaker into the HMM, by having the SSL vector as an additional observed variable.…”
Section: Hidden Markov Model Diarisationmentioning
confidence: 99%
See 1 more Smart Citation
“…An HMM is used to refine the clusters after an initial run of AHC. Previous approaches to using an HMM for diarisation have used the speaker embeddings as the observed variable [12,13]. This report proposes to incorporate the spatial location of the current active speaker into the HMM, by having the SSL vector as an additional observed variable.…”
Section: Hidden Markov Model Diarisationmentioning
confidence: 99%
“…Furthermore, each AHC merger iteration requires an optimisation of the HMM to be run, using the Expectation-Maximisation (EM) algorithm, which can be computationally expensive. Work in [13] separates out the HMM from the AHC. AHC is stopped when a measured similarity between the remaining clusters falls below a threshold.…”
Section: Introductionmentioning
confidence: 99%
“…Among all pyannote.audio alternatives, it is the most similar: written in Python, it provides most of the afore-This research was partly funded by the French National Research Agency (ANR) through the ODESSA (ANR-15-CE39-0010) and PLUM-COT (ANR-16-CE92-0025) projects. We would like to thank Claude Barras for providing the overlapped speech detection output corresponding to system L 1 in Table 2 of [1], Neville Ryant for the speaker diarization output of the winning submission to DIHARD 2019 [2,3], Marie Kunešová for the overlapped speech detection output corresponding to system "AMI test (all subsets) + dereverberation" in Table 2 of [4], and Sylvain Meignier for the speaker diarization output of [5] on ETAPE dataset. mentioned blocks, and goes all the way down to the actual evaluation of the system.…”
Section: Introductionmentioning
confidence: 99%
“…In most of the approaches, this stage is still based on the Agglomerative Hierachical Clustering (AHC) [10] in combination with Probabilistic Linear Discriminat Analysis (PLDA) scoring [11]. Recently, spectral clustering [12] [13], and variatioanl bayesian clustering [14,15] have been introduced, showing promising result. Also, alternatives to the PLDA scoring have been introduced using neural networks that learn how to score two speech segments [16], using siamese networks [17] or Bi-LSTMs [18].…”
Section: Introductionmentioning
confidence: 99%