Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing this collection of information Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information , 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.
REPORT DATE (DD-MM-YYYY)2. REPORT TYPE 3. DATES COVERED (From -To) 14. ABSTRACT This paper describes the one-speaker detection systems submitted by AFRL/HEC for several of the training and testing conditions in the 2005 NIST Speaker Recognition Evaluation. For each condition, the overall system score was the weighted combination of scores from several component systems. The component systems were based on (1) mel-frequency cepstral coefficients (MFCCs) and Gaussian mixture models (GMMs); (2) MFCCs and phonemespecific GMMs (PS-GMMs); (3) linear-prediction-based cepstral coefficients (LPCCs) from closed-phase analysis; (4) formant center frequencies, formant bandwidths, and fundamental frequency )FMBWFO); and (5) word language modeling (WLM). The score combination was done using single-layer perceptrons, with the grouping of the component systems depending on the lengths of the training and testing files. For some of the testing and/or training conditions involving ten-second speech files, the system performance improved from the inclusion of the FMBWFO and LPCC systems, while the MFCC/PS-GMM system provided additional benefits in the oneconversation testing conditions involving laroer amounts of training data- (5) language modeling on the words from speech recoging and testing conditions in the 2005 NIST Speaker nition transcripts (denoted here by WLM). For testing or Recognition Evaluation. For each condition, the over training conditions involving short speech files, the scores all system score was the weighted combination of scores from the MFCC, FMBWFO, and LPCC systems were from several component systems. The component syscombined using a single-layer perceptron (SLP). For testterns were based on (1) mel-frequency cepstral coeffiing and training conditions involving larger amounts of cients (MFCCs) and Gaussian mixture models (GMMs); speech data, the score combination was done in two (2) MFCCs and phoneme-specific GMMs (PS-GMMs);stages. First, the scores from fifteen PS-GMM systems (3) linear-prediction-based cepstral coefficients (LPCCs) were combined using an SLP. Then, the output score from from closed-phase analysis; (4) formant center freq...