Blind Separation of Speech Mixtures via Time-Frequency Masking

Yılmaz, Özgür; Rickard, Scott

doi:10.1109/tsp.2004.828896

Cited by 1,278 publications

(1,235 citation statements)

References 15 publications

Supporting

Mentioning

1,222

Contrasting

Unclassified

Order By: Relevance

“…We used the FD-ICA algorithm with the MuSIC-based permutation alignment algorithm described by Mitianoudis and Davies [9], setting the STFT frame size to 2048 samples, which was previously found to be appropriate for this algorithm at a 16kHz sampling rate [9,33]. For the DUET algorithm we used an STFT frame size of 1024 samples, which was found by Yilmaz and Rickard [11] to give the best separation performance at 16 kHz. For the proposed adaptive stereo basis algorithm, we used an adaptive basis frame size of 512 samples, to be consistent with preliminary experiments which indicated that this would be sufficient for separation at a 16 kHz sampling rate with reasonable room reverberation times [33].…”

Section: Methodsmentioning

confidence: 99%

“…, K. Thus the diagonal elements of H (p) are one or zero depending on whether or not a transform component is considered to belong to the subspace E p corresponding to the p-th source. Note that, in contrast to the time-frequency mask used in the DUET algorithm [11], which depends both on the frequency bin index f and the time frame index t, the ASB masking matrix H (p) operates across basis pair indices k only and is independent of the time frame.…”

Section: P With the Mask Values Given Bymentioning

confidence: 99%

“…Both FD-ICA and DUET suffer from phase ambiguities in the upper frequencies. To avoid this problem, DUET was designed under the assumption that the microphone separation, d, is small enough so that phase ambiguities do not arise [11]. Clearly, this assumption cannot always be satisfied, particularly when the problem is truly blind (i.e.…”

Section: Algorithm Comparisonmentioning

confidence: 99%

“…Histograms obtained from anechoic mixtures are typically well localised, with distinct peak regions corresponding to the sources, while they are more spread out for echoic mixtures [11]. Conversely, ASB does not make any specific assumptions regarding the mixing channel.…”

Section: Algorithm Comparisonmentioning

confidence: 99%

“…Another approach that has been found to be successful in practical applications on stereo (two-microphone) anechoic mixtures is the degenerate unmixing estimation technique (DUET) [10,11]. Here the STFT is again used to transform the signal into the time-frequency domain.…”

mentioning

confidence: 99%

See 4 more Smart Citations

An adaptive stereo basis method for convolutive blind audio source separation

et al. 2008

View full text Add to dashboard Cite

We consider the problem of convolutive blind source separation of stereo mixtures, where a pair of microphones records mixtures of sound sources that are convolved with the impulse response between each source and sensor. We propose an Adaptive Stereo Basis (ASB) source separation method for such convolutive mixtures, using an adaptive transform basis which is learned from the stereo mixture pair.The stereo basis vector pairs of the transform are grouped according to the estimated relative delay between the left and right channels for each basis, and the sources are then extracted by projecting the transformed signal onto the subspace corresponding to each group of basis vector pairs. The performance of the proposed algorithm is compared with FD-ICA and DUET under different reverberation and noise conditions, using both objective distortion measures and formal listening tests. Preprint submitted to Elsevier Science 10 August 2007NOTICE: this is the author's version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in : Neurocomputing 71(10-12), 2087-2097, June 2008. doi: 10.1016/j.neucom.2007 The results indicate that the proposed stereo coding method is competitive with both these algorithms at short and intermediate reverberation times, and offers significantly improved performance at low noise and short reverberation times.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: P With the Mask Values Given Bymentioning

confidence: 99%

Section: Algorithm Comparisonmentioning

confidence: 99%

Section: Algorithm Comparisonmentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

An adaptive stereo basis method for convolutive blind audio source separation

et al. 2008

View full text Add to dashboard Cite

show abstract

Blind Source Separation and Blind Mixture Identification Methods

Deville

2016

Wiley Encyclopedia of Electrical and Electronics Engineering

View full text Add to dashboard Cite

Blind source separation (BSS) is a generic signal processing problem. BSS methods aim to estimate a set of unknown source signals, by using a set of available signals that are mixtures of the source signals to be restored, with limited or no knowledge of the mixing transform (i.e., the transform of source signals that yields their mixtures). BSS methods were introduced in the 1980s and then quickly expanded. Various books provide a detailed description of BSS methods, or at least of some classes of such methods defined hereafter, such as independent component analysis, sparse component analysis, and nonnegative matrix factorization. Moreover, the BSS problem, focused on signal restoration, is closely linked to the estimation of the mixing transform, and thus to the problem often referred to as blind mixture identification (BMI). In this article, we overview the fields of BSS and BMI. We first define in more detail the considered goal (Section 1) and conditions of investigation (Section 2), and then we introduce the major classes of methods that make it possible to solve the considered problems. The presentation of BSS/BMI methods themselves and of typical applications is given in successive sections (Sections 3–7), where we progress from standard to more advanced configurations, in terms of properties of source signals and class of mixing transform.

show abstract