Listeners discriminate acoustic differences between phoneme categories at a higher level than similarly sized differences within phoneme categories. The question this paper aims to answer is how this pattern in perceptual sensitivity develops along an acoustic dimension that contrasts two non-native speech sounds: through acquired distinctiveness, through acquired similarity, or through a combination of the two. A pretest-training-post-test experiment was designed to study perceptual development directly, i.e., by including (i) a discrimination task to measure perceptual sensitivity, (ii) a transfer test to ensure language learning instead of stimulus learning, and (iii) a control group to exclude task repetition as an explanation of improvement. It is shown that the typical peak in perceptual sensitivity near a phoneme boundary that native listeners show is not found in relatively inexperienced language learners, despite their ability to classify a continuum in a nativelike way after short laboratory training. Experiment II indicates that a discrimination peak may be achieved by language learners, but only after much more language experience than short-term laboratory training can offer. Furthermore, reasons are given why classification improvement in the laboratory should not be taken as evidence for (i) increased discrimination of the newly learned phonemes and (ii) learning of phoneme representations.
Although previous work has shown that some speech sounds contain more speaker-dependent information than others, not much is known about the speaker information of the same segment in different linguistic contexts. The present study therefore investigated whether Dutch fricatives /s/ and /x/ from telephone dialogues contain differential speaker information as a function of syllabic position and labial co-articulation. These linguistic effects, established in earlier work on read broadband speech, were firstly investigated. Using a corpus of Dutch telephone speech, results showed that the telephone bandwidth captures the expected effects of perseverative and anticipatory labialization for back fricative /x/, for which spectral peaks fall within the telephone band, but not for front fricative /s/, for which the spectral peak falls outside the telephone band.Multinomial logistic regression shows that /s/ contains slightly more speaker information than /x/ in telephone speech and that speaker information is distributed across the speech signal in a systematic way; even though differences in classification accuracy were small, for both /s/ and /x/, codas and tokens with labial neighbours are more speaker-specific than onsets and tokens with non-labial neighbours. These findings indicate that speaker information contained by the same speech sound is not the same across contexts.
We introduce a targeted language game approach using the visual world, eye-movement paradigm to assess when and how certain intonational contours affect the interpretation of utterances. We created a computer-based card game in which elliptical utterances such as "Got a candy" occurred with a nuclear contour most consistent with a yes-no question (H* H-H%) or a statement (L* L-L%). In Experiment I we explored how such contours are integrated online. In Experiment 2 we studied the expectations listeners have for how intonational contours signal intentions: do these reflect linguistic categories or rapid adaptation to the paradigm? Prosody had an immediate effect on interpretation, as indexed by the pattern and timing of fixations. Moreover, the association between different contours and intentions was quite robust in the absence of clear syntactic cues to sentence type, and was not due to rapid adaptation. Prosody had immediate effects on interpretation even though there was a construction-based bias to interpret "got a" as a question. Taken together, we believe this paradigm will provide further insights into how intonational contours and their phonetic realization interact with other cues to sentence type in online comprehension.
As the amount of available video content increases, so does the need for better ways of browsing all this material. Because the nature of video makes it hard to process, the need arises for adequate surrogates for video that can readily be skimmed and browsed. In this paper, the effects of the use of hierarchy in a pictorial summary of keyframes are explored, and a novel type of video surrogate is presented: the VideoTree. Moreover, a prototype browser was developed and tested in a preliminary usability study. This showed that users performed better using the VideoTrees browser than using a regular storyboard-based browser. They also found it more flexible, but more difficult to use.
It has been claimed that filled pauses are transferred from the first (L1) into the second language (L2), suggesting that they are not directly learned by L2 speakers. This would make them usable for cross-linguistic forensic speaker comparisons. However, under the alternative hypothesis that vowels in the L2 are learnable, L2 speakers adapt their pronunciation. This study investigated whether individuals remain consistent in their filled pause realization across languages, by comparing filled pauses (uh, um) in L1 Dutch and L2 English by 58 females. Next to the effect of language, effects of the filled pauses' position in the utterance were considered, as these are expected to affect acoustics and also relate to fluency. Mixed-effects models showed that, whereas duration and fundamental frequency remained similar across languages, vowel realization was language-dependent. Speakers used um relatively more often in English than Dutch, whereas previous research described speakers to be consistent in their um:uh ratio across languages. Results furthermore showed that filled-pause acoustics in the L1 and L2 depend on the position in the utterance. Because filled pause realization is partially adapted to the L2, their use as a feature for cross-linguistic forensic speaker comparisons may be restricted. V
In whispered speech, the fundamental frequency is absent as a main cue to pitch. This study investigated how different pitch targets can acoustically be coded in whispered relative to normal speech. Secondary acoustic correlates that are found in normal speech may be preserved in whisper. Alternatively, whispering speakers may provide compensatory information. Compared to earlier studies, a more comprehensive set of acoustic correlates (duration, intensity, formants, center-of-gravity, spectral balance) and a larger set of materials were included. To elicit maximal acoustic differences among the low, mid, and high pitch targets, linguistic and semantic load were minimized: 12 native Dutch speakers produced the point vowels (/a, i, u/) in nonsense vowel-consonant-vowel targets (with C = {/s/, /f/}). Acoustic analyses showed that in addition to systematic changes in formants, which have been reported before, also center of gravity, spectral balance, and intensity varied with pitch target, both in whispered and normal speech. Some acoustic correlates differed more in whispered than in normal speech, suggesting that speakers can adopt a compensatory strategy when coding pitch in the speech mode lacking the main cue. Speakers furthermore varied in the extent to which particular correlates were used, and in the combination of correlates they altered systematically.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.