“…The concept of visual phoneme does not suggest an explicit definition of lips' structure during phoneme utterance. The visemes are formed based on human perceptions which are categorized using confusion matrix where the most accurately detected visemes form a phoneme-viseme table (Williams, Rutledge, Garstecki, & Katsaggelos, 1997). The deficiency of this method can be observed by the fact that there are various phoneme-viseme tables used (Goldschen, Garcia, & Petajan, 1994;Hazen, Saenko, La, & Glass, 2004;Jiang, Alwan, Auer, & Bernstein, 2001).…”