State-of-the-art prosody modelling in content-to-speech (CTS) applications still uses the same methodology to predict intonation cues as text-to-speech (TTS) applications, namely the analysis of the generated surface sentences with respect to part of speech, syntactic dependency relations and word order. On the other side, several theoretical studies argue that morphology, syntax, and information (or communicative) structure that organizes a given content (semantic or deep-syntactic structure) with respect to the intention of the speaker show a strong correlation with intonation. However, little empirical work based on sufficiently large corpora has been carried out so far to buttress this argumentation. We present empirical evidence for the Information Structure-Prosody correlation using the Wall Street Journal Penn Treebank corpus recorded by native American English speakers. Our experiments reach a prosody prediction accuracy of 80% using the hierarchical information structure from the Meaning-Text Theory, compared to 59% of the baseline.
Intonation is traditionally considered to be the most important prosodic feature, whereupon an important research effort has been devoted to automatic segmentation and labeling of speech samples to grasp intonation cues. A number of studies also show that when duration or intensity are incorporated, automatic prosody labeling is further improved. However, the combination of word level acoustic features still attains poor results when machine learning techniques are applied on annotated corpora to derive intonation for speech synthesis applications. To address this problem, we present an experimental setup for the development of a hierarchical prosodic structure model which combines linguistic features, including information structure, and three acoustic elements (intensity, pitch and duration). We show empirically that this combination leads to a considerably more accurate representation of prosody and, consequently, a more reliable automatic labeling of speech corpora for machine learning.
Early Childhood Caries (ECC) remains a global issue despite numerous advancements in research and interventional approaches. Nearly, 530 million children suffer from untreated dental caries of primary teeth. The consequences of such untreated dental caries not only limit the child's chewing and eating abilities but also, significantly impact the child's overall growth. Research has demonstrated that ECC is associated with nearly 123 risk factors. ECC has also been associated with local pain, infections, abscesses, and sleep pattern. Furthermore, it can affect the child's emotional status and decrease their ability to learn or perform their usual activities. In high-income countries, dental care continues to endorse a “current treatment-based approach” that involves high-technology, interventionist, and specialized approaches. While such approaches provide immediate benefit at an individual level, it fails to intercept the underlying causes of the disease at large. In low-income and middle-income countries (LMICs), the “current treatment approach” often remains limited, unaffordable, and unsuitable for the majority of the population. Rather, dentistry needs to focus on “sustainable goals” and integrate dental care with the mainstream healthcare system and primary care services. Dental care systems should promote “early first dental visits,” when the child is 1 year of age or when the first tooth arrives. The serious shortages of appropriately trained oral healthcare personnel in certain regions of the world, lack of appropriate technologies and isolation of oral health services from the health system, and limited adoption of prevention and oral health promotion can pose as critical barriers. The oral health care systems must focus on three major keystones to combat the burden of ECC−1. Essential oral health services are integrated into healthcare in every country ensuring the availability of appropriate healthcare accessible and available globally, 2. Integrating oral and general healthcare to effectively prevent and manage oral disease and improve oral health, 3. Collaborating with a wide range of health workers to deliver sustainable oral health care tailored to cater to the oral health care needs of local communities.
This paper deals with the adaptation of AuToBI annotation for speech synthesis purposes. AuToBI is a tool that automatically determines and classifies the standard ToBI labels for American English. AuToBI annotation is performed word-by-word. However, for speech synthesis applications that use various layers of linguistic annotation (syntax, semantic information and prosody structures) and, in particular, for the detection of the correlation between the information structure and prosody, a labeling of intonation patterns at the intonational phrase level is essential. We present a rule-based procedure for initial AuToBI annotation and its adaptation a phrase-based annotation, avoiding thus a post-processing stage of the extracted labels. To validate our proposal, the outcome of the procedure is compared with manual annotation and with patterns prognosticated by information structure-prosody correlation argued for by main stream theories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.