The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222)
DOI: 10.1109/ijcnn.2001.938830
|View full text |Cite
|
Sign up to set email alerts
|

Speech segregation based on sound localization

Abstract: We study the cocktail-party effect, which refers to the ability of a listener to attend to a single talker in the presence of adverse acoustical conditions. It has been observed that this ability improves in the presence of binaural cues. In this paper, we explore a technique for speech segregation based on sound localization cues. The auditory masking phenomenon motivates an "ideal" binary mask in which time-frequency regions that correspond to the weak signal are canceled. In our model we estimate this binar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

6
238
1
5

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 127 publications
(250 citation statements)
references
References 10 publications
6
238
1
5
Order By: Relevance
“…The processes underlying spatial hearing can be used for the segregation of speech by increasing its SNR [6]. Part of our future work will be directed towards the enhancement of speech recognition systems with the aid of SSL.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The processes underlying spatial hearing can be used for the segregation of speech by increasing its SNR [6]. Part of our future work will be directed towards the enhancement of speech recognition systems with the aid of SSL.…”
Section: Discussionmentioning
confidence: 99%
“…Sounds can provide information comparable to visual stimuli in scenarios where vision is impeded. SSL can help robots to cope with environment hazards and to communicate [6]. A meta-objective of artificial SSL systems is their portability to different robots.…”
Section: Introductionmentioning
confidence: 99%
“…SNR values for the separated target speech also indicate good separation, and informal listening tests found that target speech extracted by the system was of good quality. SNR performance reported here (10.03 dB at the smallest separation) also compares well with those of [9], although direct comparison is difficult due to differing stimuli and spatial separations. The energy-based mechanism allowing unvoiced segments to be represented in the RTNN binary mask successfully included the utterances' fricatives.…”
Section: Discussionsupporting
confidence: 65%
“…Thus, across-frequency grouping by ITD ought to provide a powerful mechanism for segregating multiple voices. Indeed, across-frequency grouping by ITD has been employed by computational models of voice separation (e.g., [8,9]). …”
Section: Locationmentioning
confidence: 99%
“…Beamforming attempts to improve SNR of a source using directional information [3,8]. Other approaches perform a timefrequency decomposition of the mixture signals and use between channel level and time delay differences in each time-frequency (T-F) unit to estimate an output signal that originates from a particular direction [8,12,14,18]. These systems use localization information as a primary cue to achieve source segregation, and show rapid performance degradation as reverberation is added to the recordings.…”
Section: Introductionmentioning
confidence: 99%