This study was motivated by the prospective role played by brain rhythms in speech perception. The intelligibility – in terms of word error rate – of natural-sounding, synthetically generated sentences was measured using a paradigm that alters speech-energy rhythm over a range of frequencies. The material com-prised 96 semantically unpredictable sentences, each approximately 2 s long (6–8 words per sentence), generated by a high-quality text-to-speech (TTS) synthesis engine. The TTS waveform was time-compressed by a factor of 3, creating a signal with a syllable rhythm three times faster than the original, and whose intel-ligibility is poor (<50% words correct). A waveform with an artificial rhythm was produced by automatically segmenting the time-compressed waveform into consecutive 40-ms fragments, each followed by a silent interval. The parameters varied were the length of the silent interval (0–160 ms) and whether the lengths of silence were equal (‘periodic’) or not (‘aperiodic’). The performance curve (word error rate as a function of mean duration of silence) was U-shaped. The lowest word error rate (i.e., highest intelligibility) occurred when the silence was 80 ms long and inserted periodically. This is also the condition for which word error rate increased when the silence was inserted aperiodically. These data are consistent with a model (TEMPO) in which low-frequency brain rhythms affect the ability to decode the speech signal. In TEMPO, optimum intelligibility is achieved when the syllable rhythm is within the range of the high theta-frequency brain rhythms (6–12 Hz), comparable to the rate at which segments and syllables are articulated in conversational speech.
Temporal properties associated with the speech signal are potentially important for understanding spoken language. Five hours of spontaneous American English dialogue material (from the SWITCHBOARD corpus) were hand-labeled and segmented at the phonetic-segment level; a fortyfive-minute subset was also manually annotated (at the syllabic level) with respect to stress accent.
Current-generation automatic speech recognition (ASR) systems model spoken discourse as a linear sequence of words and phones. Because it is unusual for every phone within a word to be pronounced in a standard ("canonical") way, ASR systems often depend on a multi-pronunciation lexicon to match an acoustic sequence with a lexical unit. Since there are, in practice, many different ways for a word to be pronounced, this standard approach adds a layer of complexity and ambiguity to the decoding process which, if modified, could potentially improve recognition performance. Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is systematic at the level of the syllable. Syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents. Prosodic stress also plays an important role in pronunciation. The governing mechanism is likely to involve the informational valence associated with syllable elements, and for this reason pronunciation variation offers a potential window onto the mechanisms responsible for the production and understanding of speech.
1. Amplitude modulation (AM) is a pervasive property of acoustic communication systems. In the present study we investigate neural temporal mechanisms in the auditory nerve and cochlear nuclei of the pentobarbital sodium-anesthesized cat associated with the neural coding of 100% AM tones, both in quiet and in the presence of wideband, quasi-flat-spectrum noise. The AM carrier frequency was set to the neuron's characteristic frequency (CF) and the sound pressure level (SPL) of acoustic stimuli was varied over a wide dynamic range of intensities (< or = 40 dB). The temporal AM-encoding capability of auditory neurons was measured by computing the synchronization coefficient (SC) of the neural response to the signal's modulation and carrier frequency. The temporal modulation transfer function (tMTF) of a neuron was then computed by measuring the SC of the response to signals of variable fmod (50-2550 Hz). 2. Neurons in the cochlear nuclei synchronize on average more highly to the modulation frequency than fibers of comparable CF, threshold, and spontaneous rate in the auditory nerve. The disparity in performance is greatest at high SPLs and low signal-to-noise ratios. However, there is a significant degree of diversity in AM-encoding capability among neurons in both the cochlear nuclei and auditory nerve. Among auditory nerve fibers (ANFs), low- and medium-spontaneous-rate (SR) units (SR < 18 spike/s) phase-lock with greater precision than comparable high-SR units at any given frequency, particularly at moderate to high SPLs, consistent with previous studies. 3. The phase-locking capabilities of neurons in the cochlear nucleus are considerably more variable than in the auditory nerve. Moreover, the variability itself depends on two distinct measures of phase-locking performance. Most ANFs are capable of phase-locking to frequencies as high as 3-4 kHz. In the cochlear nucleus many unit types do not phase-lock to modulation frequencies > 1 kHz. As a result, phase-locking performance is measured on the basis of two parameters, maximum synchronization, irrespective of stimulus frequency, and the upper frequency limit for significant phase-locking. 4. Cochlear nucleus neurons may be divided into three distinct groups on the basis of maximum synchronization capability. In group 1 are the primary-like (PL) units of the anteroventral division, whose phase-locking capabilities are comparable with those of high-SR ANFs.(ABSTRACT TRUNCATED AT 400 WORDS)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.