William J. Idsardi scite author profile

Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.

show abstract

Perceptual and Phonetic Experiments on American English Dialect Identification

Purnell

Idsardi

Baugh

1999

Journal of Language and Social Psychology

370

254

View full text Add to dashboard Cite

The ability to discern the use of a nonstandard dialect is often enough information to also determine the speaker's ethnicity, and speakers may consequently suffer discrimination based on their speech. This article, detailing four experiments, shows that housing discrimination based solely on telephone conversations occurs, dialect identification is possible using the word hello, and phonetic correlates of dialect can be discovered. In one experiment, a series of telephone surveys was conducted; housing was requested from the same landlord during a short time period using standard and nonstandard dialects. The results demonstrate that landlords discriminate against prospective tenants on the basis of the sound of their voice during telephone conversations. Another experiment was conducted with untrained participants to confirm this ability; listeners identified the dialects significantly better than chance. Phonetic analysis reveals that phonetic variables potentially distinguish the dialects.

show abstract

The use of acoustic cues for phonetic identification: Effects of spectral degradation and electric hearing

2012

View full text Add to dashboard Cite

Although some cochlear implant (CI) listeners can show good word recognition accuracy, it is not clear how they perceive and use the various acoustic cues that contribute to phonetic perceptions. In this study, the use of acoustic cues was assessed for normal-hearing (NH) listeners in optimal and spectrally degraded conditions, and also for CI listeners. Two experiments tested the tense/lax vowel contrast (varying in formant structure, vowel-inherent spectral change, and vowel duration) and the word-final fricative voicing contrast (varying in F1 transition, vowel duration, consonant duration, and consonant voicing). Identification results were modeled using mixed-effects logistic regression. These experiments suggested that under spectrally-degraded conditions, NH listeners decrease their use of formant cues and increase their use of durational cues. Compared to NH listeners, CI listeners showed decreased use of spectral cues like formant structure and formant change and consonant voicing, and showed greater use of durational cues (especially for the fricative contrast). The results suggest that although NH and CI listeners may show similar accuracy on basic tests of word, phoneme or feature recognition, they may be using different perceptual strategies in the process.

show abstract

Laryngeal dimensions, completion and enhancement

Avery¹,

Idsardi²

2001

160

View full text Add to dashboard Cite

Perceptual Distortions in the Adaptation of English Consonant Clusters: Syllable Structure or Consonantal Contact Constraints?

2007

View full text Add to dashboard Cite

We present the results from an experiment that tests the perception of English consonantal sequences by Korean speakers and we confirm that perceptual epenthesis in a second languge (L2) arises from syllable structure restrictions of the first language (L1), rather than linear co-occurence restrictions. Our study replicates and extends Dupoux, Kakehi, Hirose, Pallier, & Mehler's (1999) results that suggested that listeners perceive epenthetic vowels within consonantal sequences that violate the phonotactics of their L1. Korean employs at least two kinds of phonotactic restrictions: (i) syllable structure restrictions that prohibit the occurence of certain consonants in coda position STRUCTURE (e.g., *[c.], *[g.]), while allowing others (e.g., [k.], [l.]), and (ii) consonantal contact restrictions that ban the co-occurrence of certain heterosyllabic consonants (e.g., *[k.m]; *[l.n]) due to various phonological processes that repair such sequences on the surface (i.e., /k.m/ --> [n.m]; /ll.n/ --> [l.l]). The results suggest that Korean syllable structure restrictions, rather than consonantal contact restrictions, result in the perception of epenthetic vowels. Furthermore, the frequency of co-occurrence fails to explain the epenthesis effects in the percept of consonant clusters employed in the present study. We address questions regarding the interaction between speech perception and phonology and test the validity of Steriade's (2001 a,b) Perceptual-Mapping (P-Map) hypothesis for the Korean sonorant assimilation processes. Our results indicate that Steriade's hypothesis makes incorrect predictions about Korean phonology and that speech perception is not isomorphic to speech production.

show abstract

General properties of stress and metrical structure

Halle¹,

Idsardi²

1994

View full text Add to dashboard Cite

The influence of meaning on the perception of speech sounds

Kazanina

Phillips

Idsardi

2006

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

As part of knowledge of language, an adult speaker possesses information on which sounds are used in the language and on the distribution of these sounds in a multidimensional acoustic space. However, a speaker must know not only the sound categories of his language but also the functional significance of these categories, in particular, which sound contrasts are relevant for storing words in memory and which sound contrasts are not. Using magnetoencephalographic brain recordings with speakers of Russian and Korean, we demonstrate that a speaker's perceptual space, as reflected in early auditory brain responses, is shaped not only by bottom-up analysis of the distribution of sounds in his language but also by more abstract analysis of the functional significance of those sounds.auditory cortex ͉ native phonology ͉ magnetoencephalography M uch research in speech perception has explored crosslanguage perceptual differences among speakers who have been exposed to different sets of sounds in their respective native languages. This body of work has found that effects of experience with different sound distributions are observed early in development (1-3) and are evident in early automatic brain responses in adults (e.g., ref. 4). In contrast, in this study we investigate how perceptual space is influenced by higher-level factors that are relevant for the encoding of words in long-term memory while holding constant the acoustic distribution of the sounds.Recently, a number of proposals have suggested that the properties of a speaker's perceptual space can be derived from the distribution of sounds in acoustic space. According to such accounts, the learner discovers sound categories in the language input by identifying statistical peaks in the distribution of sounds in acoustic space. Recent evidence suggests that infants may indeed be able to carry out such distributional analyses (5, 6). However, such distributional analyses are of less use in determining how these sounds are linked to phonemes, the abstract sound-sized units that are used to encode words in memory. This is because there is not a one-to-one mapping between phoneme categories, the units used to store words, and the speech sound categories, sometimes known as phones, that are used to realize phonemes (7,8). There are different possible mappings between phonemes and speech sounds, and therefore sets of sound categories with similar acoustic distributions may map onto different sets of phonemes across languages. A pair of sound categories in a language may be straightforwardly represented as a pair of different phonemes for purposes of word storage. Following standard notation, phonemes are represented by using slashes, and speech sounds͞phones are represented by using square brackets, e.g., phoneme ͞p͞ vs. speech sound [p].For example, for an adult English speaker the first sound in words like pin or pat and the second sound in words like spin or spam correspond to the same phoneme ͞p͞, and they are encoded identically in word storage. Yet word-initial ...

show abstract

Stress processing in Mandarin and Korean second language learners of English

et al. 2013

View full text Add to dashboard Cite

This study examined stress processing among Mandarin and Korean second language learners of English and English monolinguals. While both English and Mandarin have contrastive stress at the word-level, Korean does not. Consequently, Mandarin speakers may have an advantage over Korean speakers in English stress processing, even when matched for their general English proficiency. Experiment 1 assessed participants ' stress encoding ability for nonwords in a short-term memory task. Experiment 2 examined the effect of stress in online word recognition in a lexical decision task by manipulating word frequency, stress location, and vowel quality. The results of both experiments support an advantage for English and Mandarin speakers over Korean speakers in stress processing of real words and nonwords. Only Korean speakers ' lexical judgment of nonwords was modulated by word frequency, suggesting that they do not utilize stress in lexical access. OnlyEnglish speakers ' word recognition was facilitated by vowel quality changes. These results suggest that the abilities of non-native speakers to process stress in their L2 is influenced by the characteristics of the stress systems in their LI.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.