Audio-Visual Speech Recognition Using Convolutive Bottleneck Networks for a Person with Severe Hearing Loss

Takashima, Yuki; Kakihara, Yasuhiro; Aihara, Ryo; Takiguchi, Tetsuya; Ariki, Yasuo; Mitani, Nobuyuki; Omori, Kiyohiro; Nakazono, Kaoru

doi:10.2197/ipsjtcva.7.64

Cited by 8 publications

(3 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their method was evaluated on three datasets (AVEC, AVLetters, and CUAVE). Takashima et al [13] proposed a multi-modal feature extraction method using a Convolutive Bottleneck Network (CBN), and applied to audio-visual data. Extracted bottleneck audio and visual features were used as the features input to the audio or visual HMMs and the recognition results then integrated.…”

Section: Introductionmentioning

confidence: 99%

Concatenated Frame Image Based CNN for Visual Speech Recognition

Saitoh

Zhou

Zhao

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This paper proposed a novel sequence image representation method called concatenated frame image (CFI), two types of data augmentation methods for CFI, and a framework of CFI-based convolutional neural network (CNN) for visual speech recognition (VSR) task. CFI is a simple, however, it contains spatial-temporal information of a whole image sequence. The proposed method was evaluated with a public database OuluVS2. This is a multi-view audiovisual dataset recorded from 52 subjects. The speaker independent recognition tasks were carried out with various experimental conditions. As the result, the proposed method obtained high recognition accuracy.

show abstract

Section: Introductionmentioning

confidence: 99%

Concatenated Frame Image Based CNN for Visual Speech Recognition

Saitoh

Zhou

Zhao

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The model tested on LRW [23] dataset and experimental results were presented. Takashima et al [5] developed a deep learning-supported speech recognition system for people with severe hearing loss. Both voice and visual data were used in the method and extracted features were included in system for classification.…”

Section: Related Workmentioning

confidence: 99%

“…UMAN ACTION RECOGNITION is an important phase for human computer interaction [1]. Lip reading, a subcategory of human action recognition, has begun to be used in various applications [2][3][4][5][6].…”

Section: Introductionmentioning

confidence: 99%

Lip Reading Using Convolutional Neural Networks with and without Pre-Trained Models

Ozcan

Baştürk

2019

Balkan Journal of Electrical and Computer Engineering

View full text Add to dashboard Cite

Lip reading has become a popular topic recently. There are widespread literature studies on lip reading in human action recognition. Deep learning methods are frequently used in this area. In this paper, lip reading from video data is performed using self designed convolutional neural networks (CNNs). For this purpose, standard and also augmented AvLetters dataset is used in train and test stages. To optimize network performance, minibatchsize parameter is also tuned and its effect is investigated. Additionally, experimental studies are performed using AlexNet and GoogleNet pre-trained CNNs. Detailed experimental results are presented.

show abstract

Lip reading using a dynamic feature of lip images and convolutional neural networks

Takashima

Takiguchi

et al. 2016

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS)

View full text Add to dashboard Cite

Audio-Visual Speech Recognition Using Convolutive Bottleneck Networks for a Person with Severe Hearing Loss

Cited by 8 publications

References 16 publications

Concatenated Frame Image Based CNN for Visual Speech Recognition

Concatenated Frame Image Based CNN for Visual Speech Recognition

Lip Reading Using Convolutional Neural Networks with and without Pre-Trained Models

Lip reading using a dynamic feature of lip images and convolutional neural networks

Contact Info

Product

Resources

About