Single-Channel Speech-Music Separation for Robust ASR With Mixture Models

Demir, Cemil; Saraçlar, Murat; Cemgil, Ali Taylan

doi:10.1109/tasl.2012.2231072

Cited by 22 publications

(12 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The vector will be approximately sparse, providing the clipping level is high enough [17]. 2) Single channel denoising: In many applications people wish to separate clean speeches from various kinds of interferences like music noise, babble noise and vehicle noise [27], [31], etc. This is a typical SCSS problem and we need to learn from previous speeches and learn from noise data.…”

Section: Our Analysis Of the Scss Problemmentioning

confidence: 99%

“…However, the RIP condition guarantees a successful recovery of sparse vector providing it is sparse enough. The recovery task is fulfilled by solving the following sparse optimization problem (29) or its equivalent form (30) With a specified confidence level factor , problem (30) can be rewritten as (31) which can be easily solved by the OMP algorithm. Finally the impulsive samples are updated as (32) The remaining tiny Gaussian noise can be suppressed by a simple spectral subtraction approach.…”

Section: A the Fodar-domp Algorithm Based On Rtfdmentioning

confidence: 99%

See 1 more Smart Citation

A Robust Time-frequency Decomposition Model for Suppression of Mixed Gaussian-impulse Noise in Audio Signals

Tong

Zhou

Zhang

et al. 2014

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

In this paper, we propose a robust time-frequency decomposition (RTFD) model to restore audio signals degraded by sparse impulse noise mixed with small dense Gaussian noise. This kind of noise is very common especially in old-time recordings. The proposed RTFD model is based on the observation that these degraded audio signals mainly contain four parts, i.e., the quasi-periodic and voiced part, the aperiodic and transient part, the arbitrarily large impulse noise and the small dense Gaussian noise. Sparsity and local correlations of corresponding parts are exploited to solve the RTFD model. We also heuristically develop a discriminative orthogonal matching pursuit (DOMP) algorithm to more precisely estimate sparse representing vectors. Specifically, the DOMP algorithm divides the whole atom set into two subsets, i.e., the active subset and the passive subset. Atoms in two subsets are treated discriminatively since sparsity regularization terms are not equally weighted. Based on RTFD and DOMP, we have developed two algorithms, i.e., the fidelity-oriented algorithm and the articulation-oriented algorithm. The proposed algorithms achieve considerable performance on both synthetic and real noisy signals. Results show that the articulation-oriented algorithm using DOMP obviously outperforms other algorithms in heavier impulse noise situations.Index Terms-Articulation-oriented, degraded, discriminative orthogonal matching pursuit (DOMP), Gaussian-impulse noise, restoration, robust time-frequency decomposition (RTFD).

show abstract

Section: Our Analysis Of the Scss Problemmentioning

confidence: 99%

Section: A the Fodar-domp Algorithm Based On Rtfdmentioning

confidence: 99%

A Robust Time-frequency Decomposition Model for Suppression of Mixed Gaussian-impulse Noise in Audio Signals

Tong

Zhou

Zhang

et al. 2014

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Speech separation aims at extracting the speech signals of each speaker in a noisy mixture. It has many applications, for example in automatic speech recognition [1], hearing aids [2] or music processing [3]. In recent years, deep neural network (DNN)-based solutions have replaced model-based approaches because of the great progress they enabled [4][5][6][7][8].…”

Section: Introductionmentioning

confidence: 99%

Distributed Speech Separation in Spatially Unconstrained Microphone Arrays

Furnon

Serizel

Illina

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different sources using sophisticated deep neural networks which are very tedious to train. When several microphones are available, spatial information can be exploited to design much simpler algorithms to discriminate speakers. We propose a distributed algorithm that can process spatial information in a spatially unconstrained microphone array. The algorithm relies on a convolutional recurrent neural network that can exploit the signal diversity from the distributed nodes. In a typical case of a meeting room, this algorithm can capture an estimate of each source in a first step and propagate it over the microphone array in order to increase the separation performance in a second step. We show that this approach performs even better when the number of sources and nodes increases. We also study the influence of a mismatch in the number of sources between the training and testing conditions.

show abstract

“…Musical sound sources also often corrupt speech signals, which is relevant for separating speech in movies, radio shows, or home speaker speech recognition. The speech-music separation task has mainly been studied in simplified settings so far [2,3].…”

Section: Introductionmentioning

confidence: 99%

Joint Phoneme Alignment and Text-Informed Speech Separation on Highly Corrupted Speech

Schulze-Forster

Doire

Richard

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

Single-Channel Speech-Music Separation for Robust ASR With Mixture Models

Cited by 22 publications

References 30 publications

A Robust Time-frequency Decomposition Model for Suppression of Mixed Gaussian-impulse Noise in Audio Signals

A Robust Time-frequency Decomposition Model for Suppression of Mixed Gaussian-impulse Noise in Audio Signals

Distributed Speech Separation in Spatially Unconstrained Microphone Arrays

Joint Phoneme Alignment and Text-Informed Speech Separation on Highly Corrupted Speech

Contact Info

Product

Resources

About