Speech Enhancement, Modeling and Recognition- Algorithms and Applications 2012
DOI: 10.5772/36466
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Visual Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 50 publications
0
5
0
Order By: Relevance
“…This dataset and the solutions evaluated on it predated the deep learning revolution. Rothkrantz et al achieved the state-of-the-art result on the NDUTAVSC dataset with an accuracy of 84.27% [73]. This dataset is not used as a metric for many ALR models due to the fact that it is in Dutch and due to the lack of variation within the dataset.…”
Section: Ndutavscmentioning
confidence: 99%
“…This dataset and the solutions evaluated on it predated the deep learning revolution. Rothkrantz et al achieved the state-of-the-art result on the NDUTAVSC dataset with an accuracy of 84.27% [73]. This dataset is not used as a metric for many ALR models due to the fact that it is in Dutch and due to the lack of variation within the dataset.…”
Section: Ndutavscmentioning
confidence: 99%
“…Audio information detects the acoustic waveform of a speaker, whereas visual information detects lip movements [1]. Despite the challenges such as auditory recognition in noisy environments, audiovisual speech recognition (AVSR) is widely investigated and is reported to exhibit excellent recognition capabilities [2][3][4][5][6]. AVSR is used in technologies such as Microsoft Azure, Google Assistant, and Amazon Alexa, which convert analog signals into digital formats by acoustically analyzing speech and automatically transcribing it into Sanghun Jeon, Jieun Lee, and Dohyeon Yeo equally contributed to this work.…”
Section: Introductionmentioning
confidence: 99%
“…Vision plays a crucial role in speech understanding, and the importance of utilizing visual information to improve the performance and robustness of speech recognition has been demonstrated [ 2 , 3 , 4 ]. Although acoustic information is richer than visual information when speaking, most people rely on watching lip movements to fully understand speech [ 2 ]. Furthermore, people rely on visual information in noisy environments where receiving auditory information is challenging.…”
Section: Introductionmentioning
confidence: 99%