C. Neti scite author profile

Abstract-Visual speech information from the speaker's mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability into the human computer interface. In this paper, we review the main components of audio-visual automatic speech recognition and present novel contributions in two main areas: First, the visual front end design, based on a cascade of linear image transforms of an appropriate video region-of-interest, and subsequently, audio-visual speech integration. On the later topic, we discuss new work on feature and decision fusion combination, the modeling of audio-visual speech asynchrony, and incorporating modality reliability estimates to the bimodal recognition process. We also briefly touch upon the issue of audiovisual speaker adaptation. We apply our algorithms to three multi-subject bimodal databases, ranging from small-to largevocabulary recognition tasks, recorded at both visually controlled and challenging environments. Our experiments demonstrate that the visual modality improves automatic speech recognition over all conditions and data considered, however less so for visually challenging environments and large vocabulary tasks.

show abstract

Rapid-Learning System for Cancer Care

Abernethy

Etheredge²,

Ganz³

et al. 2010

JCO

319

203

View full text Add to dashboard Cite

Compelling public interest is propelling national efforts to advance the evidence base for cancer treatment and control measures and to transform the way in which evidence is aggregated and applied. Substantial investments in health information technology, comparative effectiveness research, health care quality and value, and personalized medicine support these efforts and have resulted in considerable progress to date. An emerging initiative, and one that integrates these converging approaches to improving health care, is "rapid-learning health care." In this framework, routinely collected real-time clinical data drive the process of scientific discovery, which becomes a natural outgrowth of patient care. To better understand the state of the rapid-learning health care model and its potential implications for oncology, the National Cancer Policy Forum of the Institute of Medicine held a workshop entitled "A Foundation for Evidence-Driven Practice: A Rapid-Learning System for Cancer Care" in October 2009. Participants examined the elements of a rapid-learning system for cancer, including registries and databases, emerging information technology, patient-centered and -driven clinical decision support, patient engagement, culture change, clinical practice guidelines, point-of-care needs in clinical oncology, and federal policy issues and implications. This Special Article reviews the activities of the workshop and sets the stage to move from vision to action.

show abstract

Audiovisual automatic speech recognition

Potamianos¹,

Neti²,

Luettin³

et al. 2012

115

View full text Add to dashboard Cite

Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

Adams

Iyengar

Lin³

et al. 2003

EURASIP J. Adv. Signal Process.

114

View full text Add to dashboard Cite

We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM), hidden Markov models (HMM), and support vector machines (SVM). Models such as Bayesian networks and SVMs are used in a latefusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.

show abstract

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study

Nock¹,

Iyengar²,

Neti³

2003

View full text Add to dashboard Cite

Hierarchical discriminant features for audio-visual LVCSR

Potamianos

Luettin²,

Neti³

View full text Add to dashboard Cite

Maximally fault tolerant neural networks

Neti

Schneider

Young

1992

IEEE Trans. Neural Netw.

116

View full text Add to dashboard Cite

An application of neural network modeling is described for generating hypotheses about the relationships between response properties of neurons and information processing in the auditory system. The goal is to study response properties that are useful for extracting sound localization information from directionally selective spectral filtering provided by the pinna. For studying sound localization based on spectral cues provided by the pinna, a feedforward neural network model with a guaranteed level of fault tolerance is introduced. Fault tolerance and uniform fault tolerance in a neural network are formally defined and a method is described to ensure that the estimated network exhibits fault tolerance. The problem of estimating weights for such a network is formulated as a large-scale nonlinear optimization problem. Numerical experiments indicate that solutions with uniform fault tolerance exist for the pattern recognition problem considered. Solutions derived by introducing fault tolerance constraints have better generalization properties than solutions obtained via unconstrained back-propagation.

show abstract

Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop

Neti

Potamianos²,

Luettin³

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

C. Neti

Recent advances in the automatic recognition of audiovisual speech

Rapid-Learning System for Cancer Care

Audiovisual automatic speech recognition

Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study

Hierarchical discriminant features for audio-visual LVCSR

Maximally fault tolerant neural networks

Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop

Contact Info

Product

Resources

About