2006
DOI: 10.1007/11965152_32
|View full text |Cite
|
Sign up to set email alerts
|

Technical Improvements of the E-HMM Based Speaker Diarization System for Meeting Records

Abstract: International audienceThis paper is concerned with the speaker diarization task in the specific context of the meeting room recordings. Firstly, different technical improvements of an E-HMM based system are proposed and evaluated in the framework of the NIST RT'06S evaluation campaign. Related experiments show an absolute gain of 6.4% overall speaker di-arization error rate (DER) and 12.9% on the development and evaluation corpora respectively. Secondly, this paper presents an original strategy to deal with th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2008
2008
2017
2017

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 9 publications
0
10
0
Order By: Relevance
“…Model-based approaches tend to have better performances and rely on a two-class detector, with models pre-trained with external speech and non-speech data [6], [41], [49], [51], [52]. Speech and non-speech models may optionally be adapted to specific meeting conditions [15].…”
Section: B Speech Activity Detectionmentioning
confidence: 99%
“…Model-based approaches tend to have better performances and rely on a two-class detector, with models pre-trained with external speech and non-speech data [6], [41], [49], [51], [52]. Speech and non-speech models may optionally be adapted to specific meeting conditions [15].…”
Section: B Speech Activity Detectionmentioning
confidence: 99%
“…Gauvain et al [1998] and Sinha et al [2005] represented each fixed length window with a Gaussian distribution and measured the distance between those windows using divergence. The clustering step is achieved by a bottom-up fashion, that is, agglomerative clustering [Gauvain et al 1998;Ajmera and Wooters 2003;Reynolds and Torres-Carrasquillo 2004;Ben et al 2004], or a top-down manner, that is, divisive clustering (e.g., evolutive hidden Markov models in Fredouille and Senay [2006]). …”
Section: Related Workmentioning
confidence: 99%
“…Mostly metric-based segmentation and clustering is being used [Cas04,vL06, [Cas04] have been used for segmentation or clustering. A hierarchical top-down clustering approach was taken by [FS07] using an HMM adding new states (representing speakers) at each iteration while re-segmenting the speech data. In [ZBLG07, AWP07] a bottom-up approach was taken, also using an HMM to realign the data after each iteration.…”
Section: Assessment Of Speaker Diarization Systemsmentioning
confidence: 99%
“…Some systems just picked a single channel [vL06,ZBLG07], while others segmented each channel separately [JLSW04] before combining the results or performed some form of pre-processing [FS07,AWP07] to combine the channels into one single recording.…”
Section: Nist Benchmark Series For Speaker Diarizationmentioning
confidence: 99%