The Speaker and Language Recognition Workshop (Odyssey 2018) 2018
DOI: 10.21437/odyssey.2018-21
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Diarization based on Bayesian HMM with Eigenvoice Priors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
49
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(49 citation statements)
references
References 0 publications
0
49
0
Order By: Relevance
“…Depicted in Figure 4, our proposed approach relies heavily on the i-vector-based Variational Bayes Hidden Markov Model (VB-HMM) introduced for speaker diarization in [14], and applied to resegmentation in [15]. We use the output of the speaker diarization baseline as the binary initialization of the per-frame speaker posterior matrix: Q st is initialized to 1 if speaker s is responsible for the speech at the voiced frame t, and 0 otherwise.…”
Section: Principlementioning
confidence: 99%
See 1 more Smart Citation
“…Depicted in Figure 4, our proposed approach relies heavily on the i-vector-based Variational Bayes Hidden Markov Model (VB-HMM) introduced for speaker diarization in [14], and applied to resegmentation in [15]. We use the output of the speaker diarization baseline as the binary initialization of the per-frame speaker posterior matrix: Q st is initialized to 1 if speaker s is responsible for the speech at the voiced frame t, and 0 otherwise.…”
Section: Principlementioning
confidence: 99%
“…We first perform resegmentation using [14]'s VB-HMM module. Feature vectors for the module are length-60 MFCCs with deltas and double deltas, extracted in 10ms steps with a 25ms window.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…At a high level, the system performs diarization by dividing each recording into short overlapping segments, extracting x-vectors [33,34], scoring with probabilistic linear discriminant analysis (PLDA) [35], and clustering using agglomerative hierarchical clustering (AHC) [36]. In contrast to the original JHU system, we omit the Variational Bayes resegmentation step [37]. The trained models are distributed through GitHub 8 .…”
Section: Diarizationmentioning
confidence: 99%
“…In most of the approaches, this stage is still based on the Agglomerative Hierachical Clustering (AHC) [10] in combination with Probabilistic Linear Discriminat Analysis (PLDA) scoring [11]. Recently, spectral clustering [12] [13], and variatioanl bayesian clustering [14,15] have been introduced, showing promising result. Also, alternatives to the PLDA scoring have been introduced using neural networks that learn how to score two speech segments [16], using siamese networks [17] or Bi-LSTMs [18].…”
Section: Introductionmentioning
confidence: 99%