2019
DOI: 10.5391/ijfis.2019.19.1.1
|View full text |Cite
|
Sign up to set email alerts
|

Visual Speech Recognition of Korean Words Using Convolutional Neural Network

Abstract: In recent studies, speech recognition performance is greatly improved by using HMM and CNN. HMM is studying statistical modeling of voice to construct an acoustic model and to reduce the error rate by predicting voice through image of mouth region using CNN. In this paper, we propose visual speech recognition (VSR) using lip images. To implement VSR, we repeatedly recorded three subjects speaking 53 words chosen from an emergency medical service vocabulary book. To extract images of consonants, vowels, and fin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…Jo et al [16] collected predefined syllables of a single speaker with 7 views. Also, Lee and Park [22] and Lee et al [23] collected predefined word utterances, such as digits and city names, of 56 and 9 speakers, respectively. Unfortunately, the size of all the datasets is too minuscule to support deep learning-driven models; moreover, some are not publicly available.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Jo et al [16] collected predefined syllables of a single speaker with 7 views. Also, Lee and Park [22] and Lee et al [23] collected predefined word utterances, such as digits and city names, of 56 and 9 speakers, respectively. Unfortunately, the size of all the datasets is too minuscule to support deep learning-driven models; moreover, some are not publicly available.…”
Section: Related Workmentioning
confidence: 99%
“…1) Korean speech recognition: Not only the size of OLKAVS greatly outplays the previous audio-visual speech datasets [16,22,23], but also comparable to the audio-only Korean speech dataset [25]. Also, our pre-trained audio-visual speech recognition model can be useful when fine-tuned to other languages [10].…”
Section: Additional Use Casesmentioning
confidence: 99%