2018 52nd Asilomar Conference on Signals, Systems, and Computers 2018
DOI: 10.1109/acssc.2018.8645469
|View full text |Cite
|
Sign up to set email alerts
|

A new feature set for masking-based monaural speech separation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…In the time domain, it is common to use the original representation of the waveform or use short time frames and extract some features, such as energy, entropy, and the Zero Crossing Rate (ZCR) [18]. While, in the frequency domain, many meaningful features can be extracted, including Short Time Fourier Transform (STFT), Mel-Frequency Cepstral Coefficients (MFCC) [19], Gammatone Frequency (GF), Gammatone Frequency Cepstral Coefficients (GFCC) [20], and Perceptual Linear Prediction (PLP) [21].…”
Section: Data Structurementioning
confidence: 99%
“…In the time domain, it is common to use the original representation of the waveform or use short time frames and extract some features, such as energy, entropy, and the Zero Crossing Rate (ZCR) [18]. While, in the frequency domain, many meaningful features can be extracted, including Short Time Fourier Transform (STFT), Mel-Frequency Cepstral Coefficients (MFCC) [19], Gammatone Frequency (GF), Gammatone Frequency Cepstral Coefficients (GFCC) [20], and Perceptual Linear Prediction (PLP) [21].…”
Section: Data Structurementioning
confidence: 99%
“…There are other features that can be extracted from the spectrogram, such as the power spectrum, which shows the distribution of the power of the frequency components of the speech; Mel spectrum, which represents the spectrum in the Mel scale; and log power spectrum, in which the log operation is performed to the power spectrum in order to decrease the dynamic range, and ease the training process [18]. Mel-Frequency Cepstral Coefficients (MFCC) is another feature extracted by applying a Discrete Cosine Transform (DCT) to the log-compressed Mel scale power spectrum.…”
Section: A Spectrogram Based T-f Mapping Targetsmentioning
confidence: 99%
“…The ideal binary mask (IBM) [3], ideal ratio mask (IRM) [4] were proposed as training targets for masking based supervised speech separation, while target magnitude spectrum (TMS) [5] was used as a training target in mapping based supervised speech separation. Furthermore, recent studies have examined the effect of different input acoustic features (e.g., gammatone based features versus spectral features) [6,7] on supervised speech separation in noisy and reverberant condition.…”
Section: Introductionmentioning
confidence: 99%