2008
DOI: 10.1121/1.2935783
|View full text |Cite
|
Sign up to set email alerts
|

Speaker identification on the SCOTUS corpus

Abstract: This paper reports the results of our experiments on speaker identification in the SCOTUS corpus, which includes oral arguments from the Supreme Court of the United States. Our main findings are as follows: 1) a combination of Gaussian mixture models and monophone HMM models attains near-100% textindependent identification accuracy on utterances that are longer than one second; 2) the sampling rate of 11025 Hz achieves the best performance (higher sampling rates are harmful); and a sampling rate as low as 2000… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
305
0
3

Year Published

2014
2014
2022
2022

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 464 publications
(309 citation statements)
references
References 10 publications
(9 reference statements)
1
305
0
3
Order By: Relevance
“…The recordings were automatically segmented using the Penn Phonetics Lab Forced Aligner (P2FA; Yuan & Liberman, 2008), and the boundaries of the target speech sounds were then manually adjusted following the recommendations listed in Macha & Skarnitzl (2009). By target sounds we will mean two consecutive phones -the word-final obstruent and the initial sound of the following word; since the voicing of the former may not be independent of that of the latter in Czech speakers of English, we were interested in both of them.…”
Section: Methodsmentioning
confidence: 99%
“…The recordings were automatically segmented using the Penn Phonetics Lab Forced Aligner (P2FA; Yuan & Liberman, 2008), and the boundaries of the target speech sounds were then manually adjusted following the recommendations listed in Macha & Skarnitzl (2009). By target sounds we will mean two consecutive phones -the word-final obstruent and the initial sound of the following word; since the voicing of the former may not be independent of that of the latter in Czech speakers of English, we were interested in both of them.…”
Section: Methodsmentioning
confidence: 99%
“…The subtitles in BBC videos are not broadcast in sync with the audio. The Penn Phonetics Lab Forced Aligner [17,40] is used to force-align the subtitle to the audio signal. Errors exist in the alignment as the transcript is not verbatim -therefore the aligned labels are filtered by checking against the commercial IBM Watson Speech to Text service.…”
Section: Datasetmentioning
confidence: 99%
“…Linear regression was used to fit multivariate temporal response functions (TRFs) between the low-frequency EEG and each representation of the speech stimulus 3. The phonemic (Ph) representation was computed using forced alignment (Yuan and Liberman 2008), given a speech file and the correspondent orthographic transcription broken into 26 phonemes in the International Phonetic Alphabet (IPA). A multivariate time-series composed of 26 indicator variables was then obtained.…”
Section: Eeg Data Analysismentioning
confidence: 99%