Between-speaker variability of acoustically measurable speech rhythm (%V, ∆V(ln), ∆C(ln), and ∆Peak(ln)) and speech rate (rateCV) was investigated when within-speaker variability of (a) articulation rate and (b) linguistic structural characteristics was introduced. To study (a), 12 speakers of Standard German read 7 lexically identical sentences under five different intended tempo conditions (very slow, slow, normal, fast, very fast). To study (b), 16 speakers of Zurich Swiss German produced 16 spontaneous utterances each (256 in total) for which transcripts were made and then read by all speakers (4096 sentences; 16 speaker x 256 sentences). Between-speaker variability was tested using ANOVA with repeated measures on within-speaker factors. Results revealed strong and consistent between-speaker variability while within-speaker variability as a function of articulation rate and linguistic characteristics was typically not significant. We concluded that between-speaker variability of acoustically measurable speech rhythm is strong and robust against various sources of within-speaker variability. Idiosyncratic articulatory movements were found to be the most likely factor explaining between-speaker differences.
Intensity contours of speech signals were sub-divided into positive and negative dynamics. Positive dynamics were defined as the speed of increases in intensity from amplitude troughs to subsequent peaks, and negative dynamics as the speed of decreases in intensity from peaks to troughs. Mean, standard deviation, and sequential variability were measured for both dynamics in each sentence. Analyses showed that measures of both dynamics were separately classified and between-speaker variability was largely explained by measures of negative dynamics. This suggests that parts of the signal where intensity decreases from syllable peaks are more speaker-specific. Idiosyncratic articulation may explain such results. Abstract: Intensity contours of speech signals were sub-divided into positive and negative dynamics. Positive dynamics were defined as the speed of increases in intensity from amplitude troughs to subsequent peaks, and negative dynamics as the speed of decreases in intensity from peaks to troughs. Mean, standard deviation, and sequential variability were measured for both dynamics in each sentence. Analyses showed that measures of both dynamics were separately classified and betweenspeaker variability was largely explained by measures of negative dynamics. This suggests that parts of the signal where intensity decreases from syllable peaks are more speaker-specific. Idiosyncratic articulation may explain such results.
Integrating visual and auditory language information is critical for reading. Suppression and congruency effects in audiovisual paradigms with letters and speech sounds have provided information about low-level mechanisms of grapheme-phoneme integration during reading. However, the central question about how such processes relate to reading entire words remains unexplored. Using ERPs, we investigated whether audiovisual integration occurs for words already in beginning readers, and if so, whether this integration is reflected by differences in map strength or topography (aim 1); and moreover, whether such integration is associated with reading fluency (aim 2). A 128-channel EEG was recorded while 69 monolingual (Swiss)-German speaking first-graders performed a detection task with rare targets. Stimuli were presented in blocks either auditorily (A), visually (V) or audiovisually (matching: AVM; nonmatching: AVN). Corresponding ERPs were computed, and unimodal ERPs summated (A + V = sumAV). We applied TANOVAs to identify time windows with significant integration effects: suppression (sumAV-AVM) and congruency (AVN-AVM). They were further characterized using GFP and 3D-centroid analyses, and significant effects were correlated with reading fluency. The results suggest that audiovisual suppression effects occur for familiar German and unfamiliar English words, whereas audiovisual congruency effects can be found only for familiar German words, probably due to lexical-semantic processes involved. Moreover, congruency effects were characterized by topographic differences, indicating that different sources are active during processing of congruent compared to incongruent audiovisual words. Furthermore, no clear associations between audiovisual integration and reading fluency were found. The degree to which such associations develop in beginning readers remains open to further investigation.
Speech rhythm in terms of durational variability of different levels of phonetic inter vals can vary between speakers. The present article examines the role of syllabic intensity characteristics in rhythmic variability. Mean and peak intensity vari ability across syllables (stdevM, varcoM, stdevP, varcoP, rPVIm, nPVIm, rPVIp, nPVIp; henceforth: intensity measures) were investigated as a function of speaker in a database where within-speaker variability was strong (BonnTempo) and another database designed to examine between-speaker rhythmic variability (TEVOID). It was found that the intensity measures varied significantly between speakers in both databases. Semiautomatic speaker recognition based on duration measures (%V, ?V(ln), ?C(ln), ?Peak(ln), ?Syll(ln) and nPVISyll) and intensity measures using multinomial logistic regression and feedforward neural networks was carried out for the two databases. Results showed that intensity measures contained stronger speaker specific information compared to measures based on durational variability of phonetic intervals. In addition, effects of the recognition algorithms (speaker recognition using multinomial logistic regression was significantly better than neural networks for BonnTempo) and data normalisation procedures (z-score normalised data was significantly better than non-normalised data in TEVOID) were discovered. This means that syllable intensity characteristics play an important role in between-speaker rhythmic differences and possibly in speech rhythm variability in general.
We model the amplitude envelope of a speech signal as a kinematic system and calculate its basic parameters: displacement, velocity, and acceleration. Such system captures the smoothed amplitude fluctuation pattern over time, illustrating how energy is distributed across the signal. Although the pulmonic air pressure is the primary energy source of speech, the amplitude modulation pattern is largely determined by articulatory behaviors, especially mandible and lip movements. Therefore, there should be a correspondence between signal envelope kinematics and articulator kinematics. Previous research showed that a tremendous amount of speaker idiosyncrasies in articulation existed. Such idiosyncrasies should therefore be reflected in the envelope kinematics as well. From the signal envelope kinematics, it may be possible to infer individual articulatory behaviors. This is particularly useful for forensic phoneticians who usually have no access to articulatory data, and clinical speech pathologists who usually find it difficult to make articulatory measurements in clinical consultations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.