2023
DOI: 10.3390/s23218770
|View full text |Cite
|
Sign up to set email alerts
|

A Facial Feature and Lip Movement Enhanced Audio-Visual Speech Separation Model

Guizhu Li,
Min Fu,
Mengnan Sun
et al.

Abstract: The cocktail party problem can be more effectively addressed by leveraging the speaker’s visual and audio information. This paper proposes a method to improve the audio’s separation using two visual cues: facial features and lip movement. Firstly, residual connections are introduced in the audio separation module to extract detailed features. Secondly, considering the video stream contains information other than the face, which has a minimal correlation with the audio, an attention mechanism is employed in the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 61 publications
(57 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?