Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines

Wesker, Thorsten; Meyer, Bernd T.; Wagener, Kirsten C.; Anemüller, Jörn; Mertins, Alfred; Kollmeier, Birger

doi:10.21437/interspeech.2005-485

Cited by 38 publications

(10 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…to control stimulus delivery. The stimuli were the two syllables /ki/ and /ka/ and they were taken from the Oldenburg logatome speech corpus (OLLO; Wesker et al, 2005 ). The syllables were cut out of the available logatomes from one speaker (female speaker 1, V6 ‘normal spelling style’, no dialect).…”

Section: Methodsmentioning

confidence: 99%

The timecourse of multisensory speech processing in unilaterally stimulated cochlear implant users revealed by ERPs

Layer

Weglage

Müller

et al. 2022

NeuroImage: Clinical

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

The timecourse of multisensory speech processing in unilaterally stimulated cochlear implant users revealed by ERPs

Layer

Weglage

Müller

et al. 2022

NeuroImage: Clinical

View full text Add to dashboard Cite

“…We also used recordings taken from the OLLO database (Wesker et al, 2005), in German. The stimuli were already cut out.…”

Section: Discussion and Summarymentioning

confidence: 99%

“…Native speakers of English, French, Brazilian Portuguese, Turkish, Estonian and Bavarianaccented German recorded consonant-vowelconsonant (CVC) stimuli in carrier sentences. We also used recordings of six speakers taken from the OLLO database (Wesker et al, 2005), in standard German. Together, we use these CVC stimuli as the basis for items in both a discrimination and an assimilation experiment, described below.…”

Section: Stimulimentioning

confidence: 99%

Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models

Millet,

Chitoran,

Dunbar

2022

Preprint

View full text Add to dashboard Cite

Our native language influences the way we perceive speech sounds, affecting our ability to discriminate non-native sounds. We compare two ideas about the influence of the native language on speech perception: the Perceptual Assimilation Model, which appeals to a mental classification of sounds into native phoneme categories, versus the idea that rich, fine-grained phonetic representations tuned to the statistics of the native language, are sufficient. We operationalize this idea using representations from two state-of-the-art speech models, a Dirichlet process Gaussian mixture model and the more recent wav2vec 2.0 model. We present a new, open dataset of French-and English-speaking participants' speech perception behaviour for 61 vowel sounds from six languages. We show that phoneme assimilation is a better predictor than fine-grained phonetic modelling, both for the discrimination behaviour as a whole, and for predicting differences in discriminability associated with differences in native language background. We also show that wav2vec 2.0, while not good at capturing the effects of native language on speech perception, is complementary to information about native phoneme assimilation, and provides a good model of low-level phonetic representations, supporting the idea that both categorical and fine-grained perception are used during speech perception.

show abstract

“…We designed stimuli that have formant patterns extracted from speech samples /gu/, /fu/, and /pu/ (Figure 1). Speech samples were extracted from the Oldenburg Logatome Corpus (OLLO) speech database [Wesker et al, 2005]. We chose VCV combination syllables with German speakers with no dialect.…”

Section: Methodsmentioning

confidence: 99%

Monaural and binaural masking release with speech-like stimuli

Kim

Ratkute

Epp

2021

Preprint

View full text Add to dashboard Cite

Comodulated masking noise and binaural cues can facilitate detecting a target sound from noise. These cues can induce a decrease in detection thresholds, quantified as comodulation masking release (CMR) and binaural masking level difference (BMLD), respectively. However, their relevance to speech perception is unclear as most studies have used artificial stimuli different from speech. Here, we investigated their ecological validity using sounds with speech-like spectro-temporal dynamics. We evaluated the ecological validity of such grouping effect with stimuli reflecting formant changes in speech. We set three masker bands at formant frequencies F1, F2, and F3 based on CV combination: /gu/, /fu/, and /pu/. We found that the CMR was little (< 3 dB) while BMLD was comparable to previous findings (∼ 9 dB). In conclusion, we suggest that other features may play a role in facilitating frequency grouping by comodulation such as the spectral proximity and the number of masker bands.

show abstract

Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines

Cited by 38 publications

References 7 publications

The timecourse of multisensory speech processing in unilaterally stimulated cochlear implant users revealed by ERPs

The timecourse of multisensory speech processing in unilaterally stimulated cochlear implant users revealed by ERPs

Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models

Monaural and binaural masking release with speech-like stimuli

Contact Info

Product

Resources

About