“…via phonemic transcription. For direct approaches, the conversion function typically involves some form of regression [16,24,26,27] or indexing a codebook of visual features using the corresponding features extracted from the acoustic speech [3,13]. For indirect approaches, the mapping function involves concatenation or interpolation of pre-existing data [5,7,9,21,29] or using a generative model [2,10,17].…”