“…Speech is produced as a result of temporal overlap of articulatory gestures namely, lips, tongue tip, tongue body, tongue dorsum, velum, and larynx, which regulate constriction in different parts of the vocal tract [1]. Knowledge of articulatory kinematics together with acoustic information have shown benefit in various applications like, speech recognition [2,3], speech synthesis [4,5], speaker verification [6] and multimedia applications [7,8,9]. With the advancements in deep learning techniques, articulatory information has also shown success in silent speech interfaces (benefit patients who have lost their voice due to laryngectomy or diseases affecting the vocal folds) such as in speech recognition [10] and speech synthesis directly from articulatory position information alone [11,12].…”