“…From this short literature review, we can conclude that the pixel based feature extraction techniques [1,3,5,14,17,20] are in general better fitted to encode the lips dynamics in a compact representation than the contour-based feature extraction methods [8,12,15]. Based on this conclusion, we formulated the visual speech recognition as the process of recognizing individual words based on a new manifold representation.…”