2006
DOI: 10.1007/11965152_23
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Diarization for Multi-microphone Meetings Using Only Between-Channel Differences

Abstract: Abstract. We present a method to extract speaker turn segmentation from multiple distant microphones (MDM) using only delay values found via a crosscorrelation between the available channels. The method is robust against the number of speakers (which is unknown to the system), the number of channels, and the acoustics of the room. The delays between channels are processed and clustered to obtain a segmentation hypothesis. We have obtained a 31.2% diarization error rate (DER) for the NIST´s RT05s MDM conference… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
32
0
1

Year Published

2007
2007
2017
2017

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 30 publications
(33 citation statements)
references
References 6 publications
0
32
0
1
Order By: Relevance
“…Nonspeech frames, estimated previously, are excluded from the subsequent process. The vector of delays is then fed into the aforementioned segmentation and agglomerative clustering module instead of the acoustic vectors [36]. We experimented with several values for the segmentation and agglomerative clustering parameters such as the initial number of mixtures per cluster and the number of initial clusters as mentioned in the previous section.…”
Section: Baselinementioning
confidence: 99%
“…Nonspeech frames, estimated previously, are excluded from the subsequent process. The vector of delays is then fed into the aforementioned segmentation and agglomerative clustering module instead of the acoustic vectors [36]. We experimented with several values for the segmentation and agglomerative clustering parameters such as the initial number of mixtures per cluster and the number of initial clusters as mentioned in the previous section.…”
Section: Baselinementioning
confidence: 99%
“…Experiments have shown that as stand alone features [7], TDOA performs poorly respect to MFCC but significant performance improvements are obtained when TDOA are used in combination with MFCC [8], [9].…”
mentioning
confidence: 99%
“…Conventional speaker diarization systems [1] use an ergodic Hidden Markov Model (HMM) with speakers as HMM states. Good results were achieved by the systems using the combination of MelFrequency Cepstral Coefficients (MFCC) and Time Difference of Arrival (TDOA) features [2] with arrays composed of different number of microphones, while performance of standalone TDOA features was estimated as poor in respect to MFCC [3]. TDOA features can be used without prior knowledge of the geometry of the microphone array.…”
Section: Introductionmentioning
confidence: 99%