2016
DOI: 10.1007/978-3-319-49685-6_30
|View full text |Cite
|
Sign up to set email alerts
|

A Data Driven Approach to Audiovisual Speech Mapping

Abstract: Abstract. The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of pri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 13 publications
0
11
0
Order By: Relevance
“…In addition, to ensure good lip tracking, each sentence is manually validated by inspecting few frames from each sentence. The aim of manual validation is to delete those sentences in which lip regions are not correctly identified [31]. Lip tracking optimisation lies outside the scope of the present work.…”
Section: Visual Feature Extractionmentioning
confidence: 99%
“…In addition, to ensure good lip tracking, each sentence is manually validated by inspecting few frames from each sentence. The aim of manual validation is to delete those sentences in which lip regions are not correctly identified [31]. Lip tracking optimisation lies outside the scope of the present work.…”
Section: Visual Feature Extractionmentioning
confidence: 99%
“…In addition, to ensure good lip tracking, each sentence is manually validated by inspecting a few frames from each sentence. The aim of manual validation is to delete those sentences in which lip regions are not correctly identified (Abel et al, 2016;Adeel et al, 2019b).…”
Section: Audio-visual Corpus and Feature Extractionmentioning
confidence: 99%
“…In contrast, not much work has been conducted to model lip reading as a regression problem for speech enhancement [27][28] [29]. 2) A critical analysis of the proposed LSTM based lipreading regression model and its comparison with the conventional MLP based regression model [31], where LSTM model has shown better capability to learn the correlation between lip movements and speech as compared to the conventional MLP models, particularly, when different number of prior visual frames are considered. 3) Addressed limitations of state-of-the-art VWF by presenting a novel EVWF.…”
Section: Introductionmentioning
confidence: 99%