2006 IEEE Odyssey - The Speaker and Language Recognition Workshop 2006
DOI: 10.1109/odyssey.2006.248125
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Segmentation and Clustering using Gender Information

Abstract: Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searciing existing data sources, gathering and maintaining the data needed, and completing and reviewng this collection of information. Send comments regarding this burden estimate or any other aspect of this collection of irformation, including sugglestions for reduang this burden to Department of Defense, Washington Headquarters Services, Directorate for Infor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
5
0

Year Published

2006
2006
2006
2006

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 13 publications
0
5
0
Order By: Relevance
“…The training conditions all involved four-wire (two-channel) conversations and were defined by the following amounts of data: (1) an excerpt esti- 'The AFRLI/EC system submitted for the conditions requiring mated to contain approximately 10 seconds of speech of speaker segmentation and clustering is described in [2]. the target on its designated side (designated as 10sec4w), The GMM-based systems, regardless of feature set, all (designated as 10sec4w) or (2) one five-minute converused Version 2.1 of the MIT Lincoln Laboratory (MITsation (designated as lconv4w) as in the 10sec4w and LL) MFCC/GMM system [5] with 2048 mixtures per I conv4w training conditions, respectively, model and diagonal covariance matrices for each mixture.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…The training conditions all involved four-wire (two-channel) conversations and were defined by the following amounts of data: (1) an excerpt esti- 'The AFRLI/EC system submitted for the conditions requiring mated to contain approximately 10 seconds of speech of speaker segmentation and clustering is described in [2]. the target on its designated side (designated as 10sec4w), The GMM-based systems, regardless of feature set, all (designated as 10sec4w) or (2) one five-minute converused Version 2.1 of the MIT Lincoln Laboratory (MITsation (designated as lconv4w) as in the 10sec4w and LL) MFCC/GMM system [5] with 2048 mixtures per I conv4w training conditions, respectively, model and diagonal covariance matrices for each mixture.…”
mentioning
confidence: 99%
“…the target on its designated side (designated as 10sec4w), The GMM-based systems, regardless of feature set, all (designated as 10sec4w) or (2) one five-minute converused Version 2.1 of the MIT Lincoln Laboratory (MITsation (designated as lconv4w) as in the 10sec4w and LL) MFCC/GMM system [5] with 2048 mixtures per I conv4w training conditions, respectively, model and diagonal covariance matrices for each mixture. In addition to the speech files, NIST provided tranAll of the GMM-based systems used a common scripts produced by an English-language speech recogspeech activity detector (SAD), 2 which worked in three nition system from BBN with word error rates typically stages. The first stage utilized a two-state speech/nonin the range of 15-30% for English conversational telespeech Hidden Markov Model (HMM) with MFCCs as phone speech.…”
mentioning
confidence: 99%
“…Application examples of the above include telephony mixed-channel speaker verification [Ore et al, 2006;Deng et al, 2006] and acoustic model adaptation for speech recognition [Pusateri & Hazen, 2002;Hain et al, 2006;Janin et al, 2006]. In the mixed-channel speaker verification task, speaker specific models after training are used to perform verification against a designated reference model.…”
Section: Speaker Clusteringmentioning
confidence: 99%
“…Segmentation is performed at these locations and further speaker clustering can then be done so as to determine the identity of the speakers present. This is the strategy that was used in papers such as [Siu et al, 1992;Wegmann et al, 1999b;Kemp et al, 2000;Ore et al, 2006]. In [Wegmann et al, 1999b], an amplitude based silence detector is used as a first pass to break up continuous broadcast news recordings into segments.…”
Section: Segmentation Using Silencementioning
confidence: 99%
See 1 more Smart Citation