2023
DOI: 10.3390/math11122665
|View full text |Cite
|
Sign up to set email alerts
|

A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition

Abstract: This article provides a detailed review of recent advances in audio-visual speech recognition (AVSR) methods that have been developed over the last decade (2013–2023). Despite the recent success of audio speech recognition systems, the problem of audio-visual (AV) speech decoding remains challenging. In comparison to the previous surveys, we mainly focus on the important progress brought with the introduction of deep learning (DL) to the field and skip the description of long-known traditional “hand-crafted” m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 145 publications
(227 reference statements)
0
2
0
Order By: Relevance
“…Personal data processing means any operation that is performed on personal data, including the collection, recording, structuring, storage, adaptation or alteration, retrieval, use, disclosure by transmission, dissemination, combination and erasure (see Art. 4(2) in the GDPR). The entity that, alone or jointly with others, determines the purposes and means of the processing of personal data, is the so-called data controller, whereas the entity that processes personal data on behalf of the controller is the data processor (see Art.…”
Section: Personal Data Protection-legal Provisionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Personal data processing means any operation that is performed on personal data, including the collection, recording, structuring, storage, adaptation or alteration, retrieval, use, disclosure by transmission, dissemination, combination and erasure (see Art. 4(2) in the GDPR). The entity that, alone or jointly with others, determines the purposes and means of the processing of personal data, is the so-called data controller, whereas the entity that processes personal data on behalf of the controller is the data processor (see Art.…”
Section: Personal Data Protection-legal Provisionsmentioning
confidence: 99%
“…Machine learning (ML) and, especially, deep learning algorithms, as a type of artificial intelligence (AI) algorithms, are widely used in various fields, such as digital image processing [1], data analytics [2], autonomous systems [3], text and speech recognition [4], face recognition [5], robotics [6], traffic prediction [7], intrusion detection [8], etc. It is still a highly evolving field with a variety of innovative applications [9,10].…”
Section: Introductionmentioning
confidence: 99%
“…In this subsection, we evaluate recent progress in VSR methodology. More detailed research on these issues can be found in related studies [68,69].…”
Section: State-of-the-art Approaches For Lip-readingmentioning
confidence: 99%
“…We have reviewed the 28 existing benchmarking corpora for VSR and Audio-Visual Speech Recognition (AVSR) suitable for training DL models in our related study [68]. However, all these existing publicly available corpora (both recorded in laboratory conditions and collected from the internet) contain speech recordings only with neutral emotions, or with a minimal amount of emotions/data not labeled with emotion classes.…”
Section: Research Corporamentioning
confidence: 99%
“…The face regions were detected using the popular FaceMesh model [49] from the MediaPipe [50] open source library. We resized the face region images to 224×224 and applied channel normalization.…”
Section: Visual-based Affective States Recognitionmentioning
confidence: 99%