Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

Mattheyses, Wesley; Latacz, Lukas; Verhelst, Werner

doi:10.1016/j.specom.2013.02.005

Cited by 25 publications

(21 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we describe the decision tree-based viseme clustering methods first proposed in (Galanes, Unverferth, Arslan, & Talkin, 1998;Rademan & Niesler, 2015), and subsequently expanded to many-to-many phoneme-to-viseme mappings in (Mattheyses et al, 2013;Rademan & Niesler, 2015). Both contributions discuss the application of regression trees to the grouping of static visemes.…”

Section: Viseme Mappingmentioning

confidence: 99%

Foreign language visemes for use in lip-synching with computer-generated audio

Whipple¹,

Agada²,

Jie³

2017

JCSIT

View full text Add to dashboard Cite

There are several state-of-art for animating human motion, most of which involves the use of markers on the human and a tracker that estimates movement based on the position and orientation of these markers. In this paper, we discuss the different methods in which to extract human lip movement from video and map to the corresponding viseme of a foreign language for smooth animation of a 3D model. We discuss the use of Active shape model for obtaining lip movements, the use of established grapheme to phoneme methods and the commonality with the English phonemes, and how these are transferred onto a 3D human model

show abstract

Section: Viseme Mappingmentioning

confidence: 99%

Foreign language visemes for use in lip-synching with computer-generated audio

Whipple¹,

Agada²,

Jie³

2017

JCSIT

View full text Add to dashboard Cite

show abstract

“…They are ordered to score the visual speech without worrying the sound of the test data sample produced so that the sound will not influence them in their scoring process. However, the volume of the sound is still on [20].…”

Section: ) Intelligibility Of the Synthesized Speechmentioning

confidence: 99%

“…The participants are asked to give their score the accuracy of the synchronization of the sound with the mouth movements [20] . The measurement tool used is MOS scale ranging from 1 to 5, namely scale 1: Bad (extremely asynchronous), 2: Poor (asynchronous), 3: Fair (fairly synchronous), 4: Good (synchronous), and 5: Excellent (truly synchronous).…”

Section: ) the Accuracy Of The Synchronization Of The Sound With Thementioning

confidence: 99%

Developing an Online Self-learning System of Indonesian Pronunciation for Foreign Learners

Muljono

Sumpeno

Arifianto

et al. 2016

Int. J. Emerg. Technol. Learn.

View full text Add to dashboard Cite

Abstract-The main part of learning a language is pronunciation. In language learning method, pronunciation practice requires more portion than the language theory. There are some obstacles experienced by foreign learners to learn Indonesian because they are still strongly influenced by their mother tongue which is really different from Indonesian. There are some courses of learning Indonesian, indeed, but the foreign learners have to stay in Indonesia to join them. On the other hand, the researchers have successfully proven that the use of Information and Communication Technology (ICT) can help the learners in learning a language. In this paper, we have developed a system of the Online Self-Learning of Indonesian Pronunciation for Foreign Learner using Indonesian Text to Audio Visual Speech which is able to help the foreign learners to overcome their obstacles in learning Indonesian, especially the pronunciation. This system consists of 2(two) application modules: Indonesian Text to Speech (ITTS) and Indonesian Text to Audio Visual Speech (ITTAVS). In order to find out whether this system is feasible or not for foreign learners' skill in pronouncing the Indonesian words, a subjective measurement using subjective test Mean Opinion Score (MOS) is used. We organized native speakers (Indonesians) as the participants of this test. Some of them are lecturers of Indonesian language and can be considered as experts. The average scores (using MOS scale) of the tests given showed a promising result. This system is dedicated to the foreign learners who need to improve their skill in pronouncing the Indonesian words accurately and to change the classical method of learning into a self-learning method.

show abstract

“…Samplebased approaches concatenate visual speech units contained in a database, where the units might be fixed-length (e.g. phonemes, visemes, or words [4,5,6,7]) or of variable length [8,9,10]. A cost function, based on phonetic context and smoothness of concatenation, is then minimised to find the set of units which form the animation.…”

Section: Introductionmentioning

confidence: 99%

“…The first decomposes the input text into phonetic units. Although phonemes have been used widely in speech processing they have been shown to be suboptimal as visual speech units [6]. Instead, we propose using dynamic visemes as speech units and compare their performance to phonetic units before combining both.…”

Section: Introductionmentioning

confidence: 99%

Visual Speech Synthesis Using Dynamic Visemes, Contextual Features and DNNs

Thangthai¹,

Milner²,

Taylor³

2016

Interspeech 2016

View full text Add to dashboard Cite

This paper examines methods to improve visual speech synthesis from a text input using a deep neural network (DNN). Two representations of the input text are considered, namely into phoneme sequences or dynamic viseme sequences. From these sequences, contextual features are extracted that include information at varying linguistic levels, from frame level down to the utterance level. These are extracted from a broad sliding window that captures context and produces features that are input into the DNN to estimate visual features. Experiments first compare the accuracy of these visual features against an HMM baseline method which establishes that both the phoneme and dynamic viseme systems perform better with best performance obtained by a combined phoneme-dynamic viseme system. An investigation into the features then reveals the importance of the frame level information which is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic output.

show abstract

Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

Cited by 25 publications

References 45 publications

Foreign language visemes for use in lip-synching with computer-generated audio

Foreign language visemes for use in lip-synching with computer-generated audio

Developing an Online Self-learning System of Indonesian Pronunciation for Foreign Learners

Visual Speech Synthesis Using Dynamic Visemes, Contextual Features and DNNs

Contact Info

Product

Resources

About