Jiaming Cheng scite author profile

Because traditional single-channel speech enhancement algorithms are sensitive to the environment and perform poorly, a speech enhancement algorithm based on attention-gated long short-term memory (LSTM) is proposed. To simulate human auditory perceptual characteristics, the algorithm divides the frequency band according to the Bark scale. Based on these bands, bark frequency cepstral coefficients (BFCCs), their derivative features and pitch-based features are extracted. Furthermore, considering that different noises have different influence on the clean speech, the attention mechanism is applied to screen out the information less polluted by noise, which is helpful to reconstruct the clean speech. To adaptively reallocate the power ratio of the speech and noise during the construction of the ratio mask, the ideal ratio mask (IRM) with the inter-channel correlation (ICC) is adopted as the learning target. In addition, to improve the performance of the network, the algorithm introduces a multiobjective learning strategy to jointly optimize the networks by using a voice activity detector (VAD). Subjective and objective experiments show that the proposed algorithm outperforms other baseline algorithms. In real-time experiment, the proposed algorithm maintains high real-time performance and fast convergence speed. INDEX TERMS Speech enhancement, long short-term memory, attention mechanism, bark scale.

show abstract

A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy

IEEE/ACM Trans. Audio Speech Lang. Process.

et al. 2021

DNN-based speech enhancement with self-attention on feature dimension

Zhao

2020

Multimed Tools Appl

Transfer Learning Algorithm for Enhancing the Unlabeled Speech

et al. 2020

IEEE Access

To improve the generalization ability of speech enhancement algorithms for unlabeled noisy speech, a speech enhancement transfer learning model based on the feature-attention multi-kernel maximum mean discrepancy (FA-MK-MMD) is proposed. To obtain a representation of the shared subspace (the part related with clean speech in the feature extracted by shared encoder) between source domain (speech with known noise and labels) and target domain (speech with unknown noise and no labels), the algorithm takes MK-MMD as loss function for reducing distribution differences between these two domains, which could improve the adaptability to the unknown noise. Furthermore, considering that different noise have different influence on the representation of shared subspace, the attention mechanism is applied to feature dimension to screen out the information less polluted by noise, which is helpful for reconstructing the clean speech. In the term of speech with unknown noise and no labels, the experiments demonstrate that the proposed algorithm has improved the frequency-weighed segmental signal-to-noise ratio (fwsegSNR), the perceptual evaluation of the speech quality (PESQ) and the short time objective intelligibility (STOI) compared with the baseline algorithm.

show abstract

Real-time speech enhancement algorithm for transient noise suppression

Xie

et al. 2020

Multimed Tools Appl