2013
DOI: 10.1016/j.specom.2013.02.005
|View full text |Cite|
|
Sign up to set email alerts
|

Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(21 citation statements)
references
References 45 publications
0
21
0
Order By: Relevance
“…In this section, we describe the decision tree-based viseme clustering methods first proposed in (Galanes, Unverferth, Arslan, & Talkin, 1998;Rademan & Niesler, 2015), and subsequently expanded to many-to-many phoneme-to-viseme mappings in (Mattheyses et al, 2013;Rademan & Niesler, 2015). Both contributions discuss the application of regression trees to the grouping of static visemes.…”
Section: Viseme Mappingmentioning
confidence: 99%
“…In this section, we describe the decision tree-based viseme clustering methods first proposed in (Galanes, Unverferth, Arslan, & Talkin, 1998;Rademan & Niesler, 2015), and subsequently expanded to many-to-many phoneme-to-viseme mappings in (Mattheyses et al, 2013;Rademan & Niesler, 2015). Both contributions discuss the application of regression trees to the grouping of static visemes.…”
Section: Viseme Mappingmentioning
confidence: 99%
“…They are ordered to score the visual speech without worrying the sound of the test data sample produced so that the sound will not influence them in their scoring process. However, the volume of the sound is still on [20].…”
Section: ) Intelligibility Of the Synthesized Speechmentioning
confidence: 99%
“…The participants are asked to give their score the accuracy of the synchronization of the sound with the mouth movements [20] . The measurement tool used is MOS scale ranging from 1 to 5, namely scale 1: Bad (extremely asynchronous), 2: Poor (asynchronous), 3: Fair (fairly synchronous), 4: Good (synchronous), and 5: Excellent (truly synchronous).…”
Section: ) the Accuracy Of The Synchronization Of The Sound With Thementioning
confidence: 99%
“…Samplebased approaches concatenate visual speech units contained in a database, where the units might be fixed-length (e.g. phonemes, visemes, or words [4,5,6,7]) or of variable length [8,9,10]. A cost function, based on phonetic context and smoothness of concatenation, is then minimised to find the set of units which form the animation.…”
Section: Introductionmentioning
confidence: 99%
“…The first decomposes the input text into phonetic units. Although phonemes have been used widely in speech processing they have been shown to be suboptimal as visual speech units [6]. Instead, we propose using dynamic visemes as speech units and compare their performance to phonetic units before combining both.…”
Section: Introductionmentioning
confidence: 99%