This paper explores two related factors which influence variation in duration, prosodic structure and redundancy in spontaneous speech. We argue that the constraint of producing robust communication while efficiently expending articulatory effort leads to an inverse relationship between language redundancy and duration. The inverse relationship improves communication robustness by spreading information more evenly across the speech signal, yielding a smoother signal redundancy profile. We argue that prosodic prominence is a linguistic means of achieving smooth signal redundancy. Prosodic prominence increases syllable duration and coincides to a large extent with unpredictable sections of speech, and thus leads to a smoother signal redundancy. The results of linear regressions carried out between measures of redundancy, syllable duration and prosodic structure in a large corpus of spontaneous speech confirm: (1) an inverse relationship between language redundancy and duration, and (2) a strong relationship between prosodic prominence and duration. The fact that a large proportion of the variance predicted by language redundancy and prosodic prominence is nonunique suggests that, in English, prosodic prominence structure is the means with which constraints caused by a robust signal requirement are expressed in spontaneous speech.
The language redundancy of a syllable, measured by its predictability given its context and inherent frequency, has been shown to have a strong inverse relationship with syllabic duration. This relationship is predicted by the smooth signal redundancy hypothesis, which proposes that robust communication in a noisy environment can be achieved with an inverse relationship between language redundancy and the predictability given acoustic observations (acoustic redundancy). A general version of the hypothesis predicts similar relationships between the spectral characteristics of speech and language redundancy. However, investigating this claim is hampered by difficulties in measuring the spectral characteristics of speech within large conversational corpora, and difficulties in forming models of acoustic redundancy based on these spectral characteristics. This paper addresses these difficulties by testing the smooth signal redundancy hypothesis with a very high-quality corpus collected for speech synthesis, and presents both durational and spectral data from vowel nuclei on a vowel-by-vowel basis. Results confirm the duration/language redundancy results achieved in previous work, and show a significant relationship between language redundancy factors and the first two formants, although these results vary considerably by vowel. In general, however, vowels show increased centralization with increased language redundancy.
Speech interfaces are growing in popularity. Through a review of 68 research papers this work maps the trends, themes, findings and methods of empirical research on speech interfaces in HCI. We find that most studies are usability/theory-focused or explore wider system experiences, evaluating Wizard of Oz, prototypes, or developed systems by using self-report questionnaires to measure concepts like usability and user attitudes. A thematic analysis of the research found that speech HCI work focuses on nine key topics: system speech production, modality comparison, user speech production, assistive technology & accessibility, design insight, experiences with interactive voice response (IVR) systems, using speech technology for development, people's experiences with intelligent personal assistants (IPAs) and how user memory affects speech interface interaction. From these insights we identify gaps and challenges in speech research, notably the need to develop theories of speech interface interaction, grow critical mass in this domain, increase design work, and expand research from single to multiple user interaction contexts so as to reflect current use contexts. We also highlight the need to improve measure reliability, validity and consistency, in the wild deployment and reduce barriers to building fully functional speech interfaces for research. Author Keywords Speech interfaces; speech HCI; review; speech technology; voice user interfaces Research Highlights• Most papers focused on usability/theory-based or wider system experience research with a focus on Wizard of Oz and developed systems, though a lack of design work • Questionnaires on usability and user attitudes often used but few were reliable or validated • Thematic analysis showed nine primary research topics • Gaps in research critical mass, speech HCI theories, and multiple user contexts
CereProc R Ltd. have recently released a beta version of a commercial unit selection synthesiser featuring XML control of speech style. The system is freely available for academic use and allows fine control of the rendered speech as well as full timings to interface with avatars and other animation.With reference to this system we will discuss current state-of-theart commercial expressive synthesis, and argue that underlying current approaches to sythesis, and current commercial pressures, make it difficult for many systems to create characterful synthesis. We will present how CereProc's approach differs from the industry standard and how we have attempted to maintain and increase the characterfullness of CereVoice's output.We will outline the expressive synthesis markup that is supported by the system, how these are expressed in underlying digital signal processing and selection tags. Finally we will present the concept of second pass synthesis where cues can be manually tweaked to allow direct control of intonation style.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.