2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6638972
|View full text |Cite
|
Sign up to set email alerts
|

Developing a speaker identification system for the DARPA RATS project

Abstract: This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differing mainly in the algorithm used for voice activity detection (VAD) and feature extraction. We show that (a) unsupervised VAD performs as well supervised methods in t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
28
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 31 publications
(28 citation statements)
references
References 17 publications
0
28
0
Order By: Relevance
“…Trial scoring was performed by the means of PLDA trained for full-rank speaker-and channelcovariance matrices. For a detailed description, see [18].…”
Section: System Descriptionmentioning
confidence: 99%
“…Trial scoring was performed by the means of PLDA trained for full-rank speaker-and channelcovariance matrices. For a detailed description, see [18].…”
Section: System Descriptionmentioning
confidence: 99%
“…[2]. In the later stages, various designs of robust features [3] are used in combination with normalization techniques such as cepstral mean and variance normalization or short-time gaussianization [4].…”
Section: Introductionmentioning
confidence: 99%
“…In this application it is very important, that the robot knows who it is interacting with, such that the interaction can be made specifically to that person, thus increasing the likeliness of sustaining the interaction. The two most common modalities for person identification are audio ( [4], [5], [6]) and vision ( [7], [8], [9]). Both speaker recognition and face recognition perform well separately under ideal conditions, however in situations where both modalities are available, which is typically the case for robots, there is an improvement to gain by fusing the two modalities [10].…”
Section: Introductionmentioning
confidence: 99%