In this study, auditory stream segregation based on differences in the rate of envelope fluctuations--in the absence of spectral and temporal fine structure cues--was tested. The temporal sequences to segregate were composed of fully amplitude-modulated (AM) bursts of broadband noises A and B. All sequences were built by the reiteration of a ABA triplet where A modulation rate was fixed at 100 Hz and B modulation rate was variable. The first experiment was devoted to measuring the threshold difference in AM rate leading subjects to perceive the sequence as two streams as opposed to just one. The results of this first experiment revealed that subjects generally perceived the sequences as a single perceptual stream when the difference in AM rate between the A and B noises was smaller than 0.75 oct, and as two streams when the difference was larger than about 1.00 oct. These streaming thresholds were found to be substantially larger than, and not related to, the subjects' modulation-rate discrimination thresholds. The results of a second experiment demonstrated that AM-rate-based streaming was adversely affected by decreases in AM depth, but that segregation remained possible as long as the AM of either the A or B noises was above the subject's AM-detection threshold. The results of a third experiment indicated that AM-rate-based streaming effects were still observed when the modulations applied to the A and B noises were set individually, either at a constant level in dB above AM-detection threshold, or at levels at which they were of the same perceived strength. This finding suggests that AM-rate-based streaming is not necessarily mediated by perceived differences in AM depth. Altogether, the results of this study indicate that sequential sounds can be segregated on the sole basis of differences in the rate of their temporal fluctuations in the absence of other temporal or spectral cues.
This special issue presents research concerning multistable perception in different sensory modalities. Multistability occurs when a single physical stimulus produces alternations between different subjective percepts. Multistability was first described for vision, where it occurs, for example, when different stimuli are presented to the two eyes or for certain ambiguous figures. It has since been described for other sensory modalities, including audition, touch and olfaction. The key features of multistability are: (i) stimuli have more than one plausible perceptual organization; (ii) these organizations are not compatible with each other. We argue here that most if not all cases of multistability are based on competition in selecting and binding stimulus information. Binding refers to the process whereby the different attributes of objects in the environment, as represented in the sensory array, are bound together within our perceptual systems, to provide a coherent interpretation of the world around us. We argue that multistability can be used as a method for studying binding processes within and across sensory modalities. We emphasize this theme while presenting an outline of the papers in this issue. We end with some thoughts about open directions and avenues for further research.
Congenital amusia is a neurogenetic disorder that affects music processing and that is ascribed to a deficit in pitch processing. We investigated whether this deficit extended to pitch processing in speech, notably the pitch changes used to contrast lexical tones in tonal languages. Congenital amusics and matched controls, all non-tonal language speakers, were tested for lexical tone discrimination in Mandarin Chinese (Experiment 1) and in Thai (Experiment 2). Tones were presented in pairs and participants were required to make same/different judgments. Experiment 2 additionally included musical analogs of Thai tones for comparison. Performance of congenital amusics was inferior to that of controls for all materials, suggesting a domain-general pitch-processing deficit. The pitch deficit of amusia is thus not limited to music, but may compromise the ability to process and learn tonal languages. Combined with acoustic analyses of the tone material, the present findings provide new insights into the nature of the pitch-processing deficit exhibited by amusics.
As previously suggested, attention may increase segregation via enhancement and suppression sensory mechanisms. To test this hypothesis, we proposed an interleaved melody paradigm with two rhythm conditions applied to familiar target melodies and unfamiliar distractor melodies sharing pitch and timbre properties. When rhythms of both target and distractor were irregular, target melodies were identified above chance level. A sensory enhancement mechanism guided by listeners' knowledge may have helped to extract targets from the interleaved sequence. When the distractor was rhythmically regular, performance was increased, suggesting that the distractor may have been suppressed by a sensory suppression mechanism.
Although segregation of both simultaneous and sequential speech items may be involved in the reception of speech in noisy environments, research on the latter is relatively sparse. Further, previous studies examining the ability of hearing-impaired listeners to form distinct auditory streams have produced mixed results. Finally, there is little work investigating streaming in cochlear implant recipients, who also have poor frequency resolution. The present study focused on the mechanisms involved in the segregation of vowel sequences and potential limitations to segregation associated with poor frequency resolution. An objective temporal-order paradigm was employed in which listeners reported the order of constituent vowels within a sequence. In Experiment 1, it was found that fundamental frequency based mechanisms contribute to segregation. In Experiment 2, reduced frequency tuning often associated with hearing impairment was simulated in normal-hearing listeners. In that experiment, it was found that spectral smearing of the vowels increased accurate identification of their order, presumably by reducing the tendency to form separate auditory streams. These experiments suggest that a reduction in spectral resolution may result in a reduced ability to form separate auditory streams, which may contribute to the difficulties of hearing-impaired listeners, and probably cochlear implant recipients as well, in multi-talker cocktail-party situations.
The influence of hearing loss and aging on the perceptual organization of sound sequences was investigated by comparing the ability of young normal-hearing subjects and elderly subjects having either impaired or normal hearing for their age to form perceptual auditory streams from sequences of harmonic complex tones as a function of differences in fundamental frequency (F0). The sequences consisted of repeating triplets of harmonic complex tones separated by a silence (ABA-). In conditions in which the F0s of the A and B tone were so low that the harmonics could not be individually resolved by the peripheral auditory system even in the young normal-hearing subjects, those subjects showed similar stream segregation performance to the elderly hearing-impaired subjects. In contrast, when the F0s of the tones were high enough for the harmonics to be largely resolved at the auditory periphery in normal-hearing subjects, but presumably unresolved in the elderly subjects, the former showed significantly more stream segregation than the latter. These results, which cannot be consistently explained in terms of age differences, suggest that auditory stream segregation is adversely affected by reduced peripheral frequency selectivity of elderly individuals. This finding has implications for the understanding of the listening difficulties experienced by elderly individuals in cocktail-party situations.
How do listeners accomplish the task of word segmentation, given that, in spoken language, there are no clear and obvious cues associated with word beginnings and ends? There is now a vast body of evidence showing that listeners use their tacit knowledge of a wide range of patterns in their native language to help them segment speech, including cues from allophonic variation, phonotactic constraints, transitional probabilities, and lexical stress (e.g., Cutler & Norris, 1988;McQueen, 1998;Quené, 1992;Saffran, Newport, & Aslin, 1996). Here, we examine the possibility that cues from intonation or the melodic structure of a language can help listeners find word beginnings in the speech stream.A given stretch of speech can be consistent with multiple lexical hypotheses, and these hypotheses can begin at different points in the input. In the French sequence l'abricot / / "the apricot," segmental information could be compatible with several competing hypotheses, such as l'abri / / "the shelter," la brique / / "the brick," and la brioche / / "the brioche." Listeners are routinely confronted with such transient segmentation ambiguities, and some ambiguities are total, as in Il m'a donné la fiche/l'affiche / / "He gave me the sheet/the poster," where there is no contextual information favoring one hypothesis over another, and the lexical hypotheses fiche "sheet" and affiche "poster" both seem to be equally supported by the (lack of ) contextual information in the input.Traditional psycholinguistic models, such as TRACE (McClelland & Elman, 1986) and Shortlist (Norris, 1994), have considered the processes underlying the mapping of sensory information from the acoustic input to the stored entries in the lexicon from such a phonemic approach. In these models, segmentation can be considered as a by-product of lexical competition, in the sense that it is achieved by a process of competition between candidate words. Lexical hypotheses that are consistent with the input are activated at any moment in time, regardless of their location in the input. Bottom-up activation and lateral inhibition among competing candidates allow listeners to resolve transient ambiguities that arise by selecting candidates that account for the entire input (such as l'abricot rather than l'abri or la brique in / /). In addition, a "modified version" of the 1994 Shortlist model incorpo- 775© We investigated the use of language-specific intonational cues to word segmentation in French. Participants listened to phonemically identical sequences such as / /, C'est la fiche/l'affiche "It's the sheet/poster." We modified the f 0 of the first vowel / / of the natural consonant-initial production la fiche, so that it was equal to that of the natural vowel-initial production l'affiche (resynth-consonant-equal condition), higher (resynthconsonant-higher condition), or lower (resynth-consonant-lower condition). In a two-alternative forced choice task (Experiment 1), increasing the f 0 in the / / of la fiche increased the percentage of vowel-initial (affiche) resp...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.