An Embedded System for In-Vehicle Visual Speech Activity Detection

Libal, Vit; Connell, Jonathan H.; Potamianos, G.; Marcheret, Etienne

doi:10.1109/mmsp.2007.4412866

Cited by 5 publications

(4 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(1) Feature extraction extracts attention-related visual features (ostensive-stimuli) from an image sequence and/or audio features from a sound stream. Various visual features are often chosen to be used as stimuli for the attention system such as the distance between a robot and a person [12,13], the head direction of the people participating in an interaction [14,15,16,17,18,19], and/or visual speaking status detection [20,21,22,23,24,25]. When audio features are used for the attention model, the direction of a sound source and the distance to a sound source are usually adopted [26,27,28].…”

Section: Introductionmentioning

confidence: 99%

Vision-Based Attentiveness Determination Using Scalable HMM Based on Relevance Theory

Tiawongsombat

Jeong

Pirayawaraporn

et al. 2019

Sensors

View full text Add to dashboard Cite

Attention capability is an essential component of human–robot interaction. Several robot attention models have been proposed which aim to enable a robot to identify the attentiveness of the humans with which it communicates and gives them its attention accordingly. However, previous proposed models are often susceptible to noisy observations and result in the robot’s frequent and undesired shifts in attention. Furthermore, most approaches have difficulty adapting to change in the number of participants. To address these limitations, a novel attentiveness determination algorithm is proposed for determining the most attentive person, as well as prioritizing people based on attentiveness. The proposed algorithm, which is based on relevance theory, is named the Scalable Hidden Markov Model (Scalable HMM). The Scalable HMM allows effective computation and contributes an adaptation approach for human attentiveness; unlike conventional HMMs, Scalable HMM has a scalable number of states and observations and online adaptability for state transition probabilities, in terms of changes in the current number of states, i.e., the number of participants in a robot’s view. The proposed approach was successfully tested on image sequences (7567 frames) of individuals exhibiting a variety of actions (speaking, walking, turning head, and entering or leaving a robot’s view). From these experimental results, Scalable HMM showed a detection rate of 76% in determining the most attentive person and over 75% in prioritizing people’s attention with variation in the number of participants. Compared to recent attention approaches, Scalable HMM’s performance in people attention prioritization presents an approximately 20% improvement.

show abstract

Section: Introductionmentioning

confidence: 99%

Vision-Based Attentiveness Determination Using Scalable HMM Based on Relevance Theory

Tiawongsombat

Jeong

Pirayawaraporn

et al. 2019

Sensors

View full text Add to dashboard Cite

show abstract

“…However, systems have only been examined in unrealistic scenarios. There are few attempts 978-1-4244-7167-6/10/$26.00 ©201 0 IEEE to incorporate the visual modality [3,4] in real-time sys tem. Recently, one notable attempt has been the work of Libal et.…”

Section: Introductionmentioning

confidence: 99%

“…Recently, one notable attempt has been the work of Libal et. al [4], who developed a real-time system to recognize visual speech activity on low cost embedded platforms. This system uses a camera mounted on the rearview mirror to monitor the driver.…”

Section: Introductionmentioning

confidence: 99%

Lip detection for audio-visual speech recognition in-car environment

Navarathna

Lucey

Fookes

et al. 2010

10th International Conference on Information Science, Signal Processing and Their Applications (ISSPA 2010)

View full text Add to dashboard Cite

Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition sys tems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recogni tion (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola Jones approach is a suitable method of locating and track ing the driver's lips despite the visual variability of illumi nation and head pose for audio-visual speech recognition system.

show abstract

“…In this paper, we propose an extraction method of lip movement images from successive image frames in the speech activity extraction process [5] which is preprocessing phase of speech recognition. The image frames are acquired from the PC image camera.…”

Section: Introductionmentioning

confidence: 99%

An Extraction Method of Lip Movement Images from Successive Image Frames in the Speech Activity Extraction Process

Kim

Lee

Park

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. In this paper, we propose an extraction method of lip movement images from successive image frames and present the possibility to utilize lip movement images in the speech activity extraction process of speech recognition phase. The image frames are acquired from the PC image camera with the assumption that facial movement is limited during talking. First of all, one new lip movement image frame is generated with comparing two successive image frames each other. Second, the fine image noises are removed. Each fitness rate is calculated by comparing the lip feature data as objectly separated images. It is analyzed whether or not there is the lip movement image through verification to the objects and three images which have higher rates in their fitnesses. As a result of linking the speech & image processing system, the interworking rate shows 99.3% even in the various illumination environments. It was visually confirmed that lip movement images are tracked and can be utilized in speech activity extraction process.

show abstract

An Embedded System for In-Vehicle Visual Speech Activity Detection

Cited by 5 publications

References 5 publications

Vision-Based Attentiveness Determination Using Scalable HMM Based on Relevance Theory

Vision-Based Attentiveness Determination Using Scalable HMM Based on Relevance Theory

Lip detection for audio-visual speech recognition in-car environment

An Extraction Method of Lip Movement Images from Successive Image Frames in the Speech Activity Extraction Process

Contact Info

Product

Resources

About