2021
DOI: 10.3390/fi13070182
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Angle Lipreading with Angle Classification-Based Feature Extraction and Its Application to Audio-Visual Speech Recognition

Abstract: Recently, automatic speech recognition (ASR) and visual speech recognition (VSR) have been widely researched owing to development in deep learning. Most VSR research works focus only on frontal face images. However, assuming real scenes, it is obvious that a VSR system should correctly recognize spoken contents from not only frontal but also diagonal or profile faces. In this paper, we propose a novel VSR method that is applicable to faces taken at any angle. Firstly, view classification is carried out to esti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…In the process of exercising teachers' educational disciplinary power, the characteristics of teachers' speech will make students' hearing more sensitive and thus also make students analyze the characteristics of teachers' speech through hearing. In this section, the CNN network and LSTM network under deep learning combined with the Gammatone auditory filter are used to simulate students' behaviors of listening to teachers' speech features, which provides a basis for predicting students' auditory emotional changes in the process of exercising the right of educational discipline [27][28]. The teacher's speech signal is first segmented into equal-length segments, and the energy of many segments after segmentation is so low that the human ear can hardly perceive the emotional information inside.…”
Section: Teacher Speech Feature Extraction Based On Cochlear Filteringmentioning
confidence: 99%
“…In the process of exercising teachers' educational disciplinary power, the characteristics of teachers' speech will make students' hearing more sensitive and thus also make students analyze the characteristics of teachers' speech through hearing. In this section, the CNN network and LSTM network under deep learning combined with the Gammatone auditory filter are used to simulate students' behaviors of listening to teachers' speech features, which provides a basis for predicting students' auditory emotional changes in the process of exercising the right of educational discipline [27][28]. The teacher's speech signal is first segmented into equal-length segments, and the energy of many segments after segmentation is so low that the human ear can hardly perceive the emotional information inside.…”
Section: Teacher Speech Feature Extraction Based On Cochlear Filteringmentioning
confidence: 99%
“…The MPEG-7 standard defines 17 time and frequency domain descriptors, among which audio signature features can represent information unique to a piece of audio and, therefore, are often used for audio recognition [26]. In this paper method, audio features are extracted firstly from the dance video by extracting the audio streaming file and secondly from the audio streaming file by extracting audio signature features and constructing an audio dictionary based on the bag-of-words model idea.…”
Section: Audio Feature Extractionmentioning
confidence: 99%
“…Extracting speaker-specific information from complex speech data is trivial for humans, but a challenging task for computers. With the development of artificial intelligence, deep learning was introduced into the field of speech recognition in 2009 [1][2][3][4]. In just a few years, it has been widely used in speech recognition, speaker recognition, text recognition, emotion recognition and other related fields.…”
Section: Introductionmentioning
confidence: 99%