The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2015
DOI: 10.2197/ipsjtcva.7.64
|View full text |Cite
|
Sign up to set email alerts
|

Audio-Visual Speech Recognition Using Convolutive Bottleneck Networks for a Person with Severe Hearing Loss

Abstract: In this paper, we propose an audio-visual speech recognition system for a person with an articulation disorder resulting from severe hearing loss. In the case of a person with this type of articulation disorder, the speech style is quite different from with the result that of people without hearing loss that a speaker-independent model for unimpaired persons is hardly useful for recognizing it. We investigate in this paper an audio-visual speech recognition system for a person with severe hearing loss in noisy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…Their method was evaluated on three datasets (AVEC, AVLetters, and CUAVE). Takashima et al [13] proposed a multi-modal feature extraction method using a Convolutive Bottleneck Network (CBN), and applied to audio-visual data. Extracted bottleneck audio and visual features were used as the features input to the audio or visual HMMs and the recognition results then integrated.…”
Section: Introductionmentioning
confidence: 99%
“…Their method was evaluated on three datasets (AVEC, AVLetters, and CUAVE). Takashima et al [13] proposed a multi-modal feature extraction method using a Convolutive Bottleneck Network (CBN), and applied to audio-visual data. Extracted bottleneck audio and visual features were used as the features input to the audio or visual HMMs and the recognition results then integrated.…”
Section: Introductionmentioning
confidence: 99%
“…The model tested on LRW [23] dataset and experimental results were presented. Takashima et al [5] developed a deep learning-supported speech recognition system for people with severe hearing loss. Both voice and visual data were used in the method and extracted features were included in system for classification.…”
Section: Related Workmentioning
confidence: 99%
“…UMAN ACTION RECOGNITION is an important phase for human computer interaction [1]. Lip reading, a subcategory of human action recognition, has begun to be used in various applications [2][3][4][5][6].…”
Section: Introductionmentioning
confidence: 99%