Speech segregation based on sound localization

Roman, Nicoleta; Wang, DeLiang; Brown, Guy J.

doi:10.1109/ijcnn.2001.938830

Cited by 127 publications

(250 citation statements)

References 10 publications

Supporting

Mentioning

238

Contrasting

Unclassified

Order By: Relevance

“…The processes underlying spatial hearing can be used for the segregation of speech by increasing its SNR [6]. Part of our future work will be directed towards the enhancement of speech recognition systems with the aid of SSL.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Biomimetic Binaural Sound Source Localisation with Ego-Noise Cancellation

Dávila-Chacón

Heinrich

Liu

et al. 2012

Artificial Neural Networks and Machine Learning – ICANN 2012

View full text Add to dashboard Cite

Abstract. This paper presents a spiking neural network (SNN) for binaural sound source localisation (SSL). The cues used for SSL were the interaural time (ITD) and level (ILD) differences. ITDs and ILDs were extracted with models of the medial superior olive (MSO) and the lateral superior olive (LSO). The MSO and LSO outputs were integrated in a model of the inferior colliculus (IC). The connection weights between the MSO and LSO neurons to the IC neurons were estimated using Bayesian inference. This inference process allowed the algorithm to perform robustly on a robot with ∼40 dB of ego-noise. The results showed that the algorithm is capable of differentiating sounds with an accuracy of 15 • .

show abstract

Section: Discussionmentioning

confidence: 99%

“…Sounds can provide information comparable to visual stimuli in scenarios where vision is impeded. SSL can help robots to cope with environment hazards and to communicate [6]. A meta-objective of artificial SSL systems is their portability to different robots.…”

Section: Introductionmentioning

confidence: 99%

Biomimetic Binaural Sound Source Localisation with Ego-Noise Cancellation

Dávila-Chacón

Heinrich

Liu

et al. 2012

Artificial Neural Networks and Machine Learning – ICANN 2012

View full text Add to dashboard Cite

show abstract

“…SNR values for the separated target speech also indicate good separation, and informal listening tests found that target speech extracted by the system was of good quality. SNR performance reported here (10.03 dB at the smallest separation) also compares well with those of [9], although direct comparison is difficult due to differing stimuli and spatial separations. The energy-based mechanism allowing unvoiced segments to be represented in the RTNN binary mask successfully included the utterances' fricatives.…”

Section: Discussionsupporting

confidence: 65%

“…Thus, across-frequency grouping by ITD ought to provide a powerful mechanism for segregating multiple voices. Indeed, across-frequency grouping by ITD has been employed by computational models of voice separation (e.g., [8,9]). …”

Section: Locationmentioning

confidence: 99%

Binaural Speech Separation Using Recurrent Timing Neural Networks for Joint F0-Localisation Estimation

Wrigley

Brown

Machine Learning for Multimodal Interaction

View full text Add to dashboard Cite

Abstract.A speech separation system is described in which sources are represented in a joint interaural time difference-fundamental frequency (ITD-F0) cue space. Traditionally, recurrent timing neural networks (RTNNs) have been used only to extract periodicity information; in this study, this type of network is extended in two ways. Firstly, a coincidence detector layer is introduced, each node of which is tuned to a particular ITD; secondly, the RTNN is extended to become twodimensional to allow periodicity analysis to be performed at each best-ITD. Thus, one axis of the RTNN represents F0 and the other ITD allowing sources to be segregated on the basis of their separation in ITD-F0 space. Source segregation is performed within individual frequency channels without recourse to across-channel estimates of F0 or ITD that are commonly used in auditory scene analysis approaches. The system is evaluated on spatialised speech signals using energy-based metrics and automatic speech recognition.

show abstract

“…Beamforming attempts to improve SNR of a source using directional information [3,8]. Other approaches perform a timefrequency decomposition of the mixture signals and use between channel level and time delay differences in each time-frequency (T-F) unit to estimate an output signal that originates from a particular direction [8,12,14,18]. These systems use localization information as a primary cue to achieve source segregation, and show rapid performance degradation as reverberation is added to the recordings.…”

Section: Introductionmentioning

confidence: 99%

On the role of localization cues in binaural segregation of reverberant speech

Woodruff

Wang

2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Approaches to binaural and stereo speech segregation have often assumed that localization information can be used as a primary cue to achieve segregation of a target signal. Results produced by these systems degrade significantly in the presence of room reverberation. In this work, we present an alternative framework to achieve localization of groups of time-frequency units. We show that grouping across time and frequency allows the use of localization as an important cue for sequential grouping of time-frequency objects. We analyze the level of time-frequency grouping needed to achieve accurate object localization and show preliminary binaural segregation results using the proposed framework. Results indicate that both localization and segregation performance can be improved by grouping across time and frequency.Index Terms -Binaural sound localization, speech segregation, reverberation, computational auditory scene analysis.

show abstract

Speech segregation based on sound localization

Cited by 127 publications

References 10 publications

Biomimetic Binaural Sound Source Localisation with Ego-Noise Cancellation

Biomimetic Binaural Sound Source Localisation with Ego-Noise Cancellation

Binaural Speech Separation Using Recurrent Timing Neural Networks for Joint F0-Localisation Estimation

On the role of localization cues in binaural segregation of reverberant speech

Contact Info

Product

Resources

About