Speaker identification on the SCOTUS corpus

Yuan, Jiahong; Liberman, Mark

doi:10.1121/1.2935783

Cited by 464 publications

(309 citation statements)

References 10 publications

(9 reference statements)

Supporting

Mentioning

305

Contrasting

Unclassified

Order By: Relevance

“…The recordings were automatically segmented using the Penn Phonetics Lab Forced Aligner (P2FA; Yuan & Liberman, 2008), and the boundaries of the target speech sounds were then manually adjusted following the recommendations listed in Macha & Skarnitzl (2009). By target sounds we will mean two consecutive phones -the word-final obstruent and the initial sound of the following word; since the voicing of the former may not be independent of that of the latter in Czech speakers of English, we were interested in both of them.…”

Section: Methodsmentioning

confidence: 99%

Assimilation of Voicing in Czech Speakers of English: The Effect of the Degree of Accentedness

Skarnitzl

Šturm

2014

RiL

View full text Add to dashboard Cite

Czech and English are languages which differ with respect to the implementation of voicing. Unlike in English, there is a considerable agreement between phonological (systemic) and phonetic (actual) voicing in Czech, and, more importantly, the two languages have different strategies for the assimilation of voicing across the word boundary. The present study investigates the voicing in word-final obstruents in Czech speakers of English with the specific aim of ascertaining whether the degree of the speakers’ foreign accent correlates with the way they treat English obstruents in assimilatory contexts. L2 speakers, divided into three groups of varying accentedness, were examined employing categorization and a voicing profile method for establishing the presence/absence of voicing. The results suggest that speakers with a different degree of Czech accent do differ in their realization of voicing in the way predicted by a negative transfer of assimilatory habits from Czech.

show abstract

Section: Methodsmentioning

confidence: 99%

Assimilation of Voicing in Czech Speakers of English: The Effect of the Degree of Accentedness

Skarnitzl

Šturm

2014

RiL

View full text Add to dashboard Cite

show abstract

“…The subtitles in BBC videos are not broadcast in sync with the audio. The Penn Phonetics Lab Forced Aligner [17,40] is used to force-align the subtitle to the audio signal. Errors exist in the alignment as the transcript is not verbatim -therefore the aligned labels are filtered by checking against the commercial IBM Watson Speech to Text service.…”

Section: Datasetmentioning

confidence: 99%

Lip Reading Sentences in the Wild

Chung¹,

Senior²,

Vinyals³

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

639

556

View full text Add to dashboard Cite

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem -unconstrained natural language sentences, and in the wild videos.Our key contributions are: (1) a 'Watch, Listen, Attend and Spell' (WLAS) network that learns to transcribe videos of mouth motion to characters; (2) a curriculum learning strategy to accelerate training and to reduce overfitting; (3) a 'Lip Reading Sentences' (LRS) dataset for visual speech recognition, consisting of over 100,000 natural sentences from British television.The WLAS model trained on the LRS dataset surpasses the performance of all previous work on standard lip reading benchmark datasets, often by a significant margin. This lip reading performance beats a professional lip reader on videos from BBC television, and we also demonstrate that visual information helps to improve speech recognition performance even when the audio is available.

show abstract

“…Linear regression was used to fit multivariate temporal response functions (TRFs) between the low-frequency EEG and each representation of the speech stimulus 3. The phonemic (Ph) representation was computed using forced alignment (Yuan and Liberman 2008), given a speech file and the correspondent orthographic transcription broken into 26 phonemes in the International Phonetic Alphabet (IPA). A multivariate time-series composed of 26 indicator variables was then obtained.…”

Section: Eeg Data Analysismentioning

confidence: 99%

Isolating Neural Indices of Continuous Speech Processing at the Phonetic Level

Liberto

Lalor

2016

Advances in Experimental Medicine and Biology

View full text Add to dashboard Cite

The human ability to understand speech across an enormous range of listening conditions is underpinned by a hierarchical auditory processing system whose successive stages process increasingly complex attributes of the acoustic input. In order to produce a categorical perception of words and phonemes, it has been suggested that, while earlier areas of the auditory system undoubtedly respond to acoustic differences in speech tokens, later areas must exhibit consistent neural responses to those tokens. Neural indices of such hierarchical processing in the context of continuous speech have been identified using low-frequency scalprecorded electroencephalography (EEG) data. The relationship between continuous speech and its associated neural responses has been shown to be best described when that speech is represented using both its low-level spectrotemporal information and also the categorical labelling of its phonetic features (Di Liberto et al., Curr Biol 25(19):2457-2465, 2015. While the phonetic features have been proven to carry extra-information not captured by the speech spectrotemporal representation, the causes of this EEG activity remain unclear. This study aims to demonstrate a framework for examining speech-specific processing and for disentangling highlevel neural activity related to intelligibility from low-level activity in response to spectrotemporal fluctuations of speech. Preliminary results suggest that neural measure of processing at the phonetic level can be isolated.

show abstract

Speaker identification on the SCOTUS corpus

Cited by 464 publications

References 10 publications

Assimilation of Voicing in Czech Speakers of English: The Effect of the Degree of Accentedness

Assimilation of Voicing in Czech Speakers of English: The Effect of the Degree of Accentedness

Lip Reading Sentences in the Wild

Isolating Neural Indices of Continuous Speech Processing at the Phonetic Level

Contact Info

Product

Resources

About