1989
DOI: 10.1109/35.41402
|View full text |Cite
|
Sign up to set email alerts
|

Integration of acoustic and visual speech signals using neural networks

Abstract: A UTOMATIC SPEECH RECOGNITION SYSTEMSrely almost exclusively on the acoustic speech signal and, consequently, these systems often perform poorly in noisy environments [ I ] . Attempts to clean up the acoustic input have had limited success [2]. Another approach is to use other sources of speech information, such as visual speech signals. performance of the system was degraded by this eqrly encoding.The need for early categorization of speech signals can be traced to the computational limitations of currently … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
59
0
1

Year Published

1997
1997
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 163 publications
(61 citation statements)
references
References 34 publications
1
59
0
1
Order By: Relevance
“…Regardless of the signal-to-noise ratio, most systems perform better using both acoustical and optical sources of information than when using only one source of information (Bregler, Omohundro, et al, 1994;Bregler, Hild, et al, 1993;Mak & Allen, 1994;Petajan, 1984;Petajan, Bischoff, et al, 1988;Silsbee, 1994;Silsbee, 1993;Smith, 1989;Stork, Wolff, et al, 1992;Yuhas, Goldstein, et al, 1989). At a signal-to-noise ratio of zero with a 500-word task Silsbee (1993) achieves word accuracy recognition rates of 38%, 22%, and 58% respectively, using acoustical information, optical information, and both sources of information.…”
Section: Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…Regardless of the signal-to-noise ratio, most systems perform better using both acoustical and optical sources of information than when using only one source of information (Bregler, Omohundro, et al, 1994;Bregler, Hild, et al, 1993;Mak & Allen, 1994;Petajan, 1984;Petajan, Bischoff, et al, 1988;Silsbee, 1994;Silsbee, 1993;Smith, 1989;Stork, Wolff, et al, 1992;Yuhas, Goldstein, et al, 1989). At a signal-to-noise ratio of zero with a 500-word task Silsbee (1993) achieves word accuracy recognition rates of 38%, 22%, and 58% respectively, using acoustical information, optical information, and both sources of information.…”
Section: Systemsmentioning
confidence: 99%
“…This comparator may consist of a set of rules (e.g., if the top two phones from the acoustic recognizer is /t/ or /p/, then choose the one that has a higher ranking from the optical recognizer) (Petajan, Bischoff, et al, 1988) or a fuzzy logic integrator (e.g., provides linear weights associated with the acoustically and optically recognized phones) (Silsbee, 1993;Silsbee, 1994). The second approach performs recognition using a vector that includes both acoustical and optical information, such systems typically use neural networks to combine the optical information with the acoustic to improve the signal-to-noise ratio before phonemic recognition (Yuhas, Goldstein, et al, 1989;Bregler, Omohundro, et al, 1994;Bregler, Hild, et al, 1993;Stork, Wolff, et al, 1992;Silsbee, 1994).…”
Section: Systemsmentioning
confidence: 99%
“…Our bimodal database consists of subjects of var- [10,20,19,5,16,12]. Most approaches attempt to show that com puter lip reading is able to improve speech recog nition, especially in noisy environments.…”
Section: Large Spontaneous Speech Dialog Sys Temmentioning
confidence: 99%
“…Equivalently for a neural network, the number of units and the number of layers would be adjusted, as well as the weights' values. The third group is increasingly addressed and corresponds to the combination of at least two sources of information to improve the robustness (Yuhas et al 1989) (McGurk and MacDonald 1976). For example, the speaker's lips movements and their corresponding acoustic signals are processed and integrated to improve the robustness of the speech recognition systems.…”
Section: Classical Proposed Solutions In Speech Recognition Robustnessmentioning
confidence: 99%