Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searciing existing data sources, gathering and maintaining the data needed, and completing and reviewng this collection of information. Send comments regarding this burden estimate or any other aspect of this collection of irformation, including sugglestions for reduang this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Daws H-ihway, Suite 1204, Arlington. VA 22202-4302. Rfespondont should be &Aare that notwithstanding any other provision of law, no person a"e be subject to any penalty for failing to comply with a collection of information if a does not display a Currently vald OMB control number PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 14. ABSTRACT This paper considers the segmentation and clustering of conversational speech for the two-wire training (3conv2w) and two-wire testing (lconv2w) conditions of the NIST 2005 Speaker Recognition Evaluation. A notable feature of the system described is that each file is labeled as containing either opposite-or same-gender speakers. The speech segments for opposite-gender files are clustered by gender, while those for same-gender files are processed by agglomerative clustering. By using gender information in the clustering of the opposite-gender files, the equal error rate in the 3conv2w training condition was reduced from 15.2% to 9.9%. For the lconv2w testing condition, clustering opposite-gender files by gender did not improve performance over agglomerative clustering; however, it was over 100 times faster than agglomerative clustering on the opposite-gender files. [4], is to use the Bayesian InformaThis paper considers the segmentation and clustertion Criterion (BIC) to detect change points. This method ing of conversational speech for the two-wire training is based on modeling the audio stream as a Gaussian pro-(3eonv2w) and two-wire testing (lconv2w) conditions cess and using a maximum likelihood approach to detect of the NIST 2005 Speaker Recognition Evaluation. A potential changes. There have been numerous adaptanotable feature of the system described is that each file tions of the BIC procedure, including the DISTBIC proisolabeled far con g ethe r oppoemdescrite-or shameendhler cedure of [2] and the modified BIC procedure of [5] that is labeled as containing either opposite-or same-gender removes the necessity for having to choose the threshold are clustered by gender, while those for same-gender files in the penalty term. In [6,7], gender and channel detecare processed by agglomerative clustering. By using gention have been used in the first stages of segmentation for der information in the clustering of the opposite-gender speaker diarization of news broadcasts. files, the equal error rate in the 3conv2w training condiClustering can be defined as grouping homogeneous tion was reduced from 15.2% to 9....
No abstract
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing this collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. REPORT DATE (DD-MM-YYYY)2 One way that has been proposed for obtaining adEvaluation (SRE) has added an optional unsupervised ditional training data for speaker models is to use data speaker adaptation track where test files are processed from on-line test cases that score highly against the resequentially and one may update the target model. In spective claimant models [1][2][3][4][5][6][7][8]. This unsupervised adapthis paper, various model adaptation techniques are imtation procedure has received considerable attention for plemented using a supervised (ideal) adaptation scheme, text-dependent applications [1][2][3][4][5][6] and for applications in Once the best performing model adaptation method is which the number of impostor trials is considerably lower found, unsupervised adaptation experiments are run usthan the number of true claimant trials [4][5][6]. ing a threshold to determine when to update the tarTwo notable studies [7, 8] have considered scenarget model. Three NIST training conditions, l0sec4w, ios derived from NIST SRE1 databases, which involve Iconv4w, and 8conv4w, all with the 1conv4w test context-independent verification with a large ratio (approxidition are used for experiments with the NIST 2005 SRE.mately 10:1) of impostor tests to true claimant tests. In MinDCF values for the three training conditions are reboth [7,8], the NIST 2002 SRE database was used to synduced from 0.0708 to 0.0277 for 10sec4w, from 0.0385 thetically create the adaptation testing paradigm, and both to 0.0199 for lconv4w, and from 0.0264 to 0.0176 for efforts showed a benefit from using unsupervised adapta8conv4w using the supervised adaptation compared to the tion. Since 2004, NIST has provided test control files to baseline. For the unsupervised adaptation, minDCF valallow for the possibility of running systems in an unsuues were reduced to 0.0590, 0.0302, and 0.0210 for the pervised adaptation mode in the annual SRE without the respective training conditions. need for synthetically-generated test control files. In addition to using the 2002 database, [8] also used the 2004
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.