1995
DOI: 10.1126/science.270.5234.303
|View full text |Cite
|
Sign up to set email alerts
|

Speech Recognition with Primarily Temporal Cues

Abstract: Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants, vowels, and words in simple sentences improved markedly as the number of bands incr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

114
2,349
16
7

Year Published

1998
1998
2018
2018

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 2,677 publications
(2,541 citation statements)
references
References 21 publications
114
2,349
16
7
Order By: Relevance
“…Although most studies of speech recognition focus on information contained in speech energy in the spectral domain, substantial evidence confirms that amplitude modulations in the speech signal carry information important for communication (Rosen 1992;Shannon et al 1995). The speech signal contains varying degrees of envelope fluctuation, which convey different types of information.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Although most studies of speech recognition focus on information contained in speech energy in the spectral domain, substantial evidence confirms that amplitude modulations in the speech signal carry information important for communication (Rosen 1992;Shannon et al 1995). The speech signal contains varying degrees of envelope fluctuation, which convey different types of information.…”
Section: Introductionmentioning
confidence: 99%
“…In some studies, near-normal recognition of speech that contains little spectral information has been achieved when the speech temporal envelope is preserved in a relatively small number of frequency channels. Indeed, Shannon et al (1995) reported that four-channel "vocoded" speech, which had very limited frequency information but relatively intact temporal envelope information, was highly intelligible.…”
Section: Introductionmentioning
confidence: 99%
“…Selectively degrading modulation frequencies near the syllable rate (4-16 Hz) degrades participants' ability to identify consonants and to understand sentences (Drullman et al, 1994). In contrast, speech stimuli that are processed to leave only relatively slow (below 40 Hz) temporal modulations enable near-perfect speech intelligibility (Shannon et al, 1995). When the amplitude envelope is analysed in terms of its constituent temporal modulation frequencies, the dominant modulation frequencies are around 4-6 Hz, reflecting the sequential rate of words and syllables (Drullman, 2006).…”
Section: Introductionmentioning
confidence: 99%
“…We use noise-vocoded speech, which is an acoustic distortion that preserves temporal information while removing the temporal fine structure and spectral detail of speech (Shannon et al, 1995). Although initially unintelligible, noisevocoded sentences can be readily understood following a period of training.…”
mentioning
confidence: 99%