ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054001
|View full text |Cite
|
Sign up to set email alerts
|

Phonetic Feedback for Speech Enhancement with and Without Parallel Speech Data

Abstract: While deep learning systems have gained significant ground in speech enhancement research, these systems have yet to make use of the full potential of deep learning systems to provide high-level feedback. In particular, phonetic feedback is rare in speech enhancement research even though it includes valuable top-down information. We use the technique of mimic loss to provide phonetic feedback to an off-the-shelf enhancement system, and find gains in objective intelligibility scores on CHiME-4 data. This techni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…More recently, deep learning (DL) has became a popular and effective machine learning algorithm [32], [33], [34] and has brought significant progress in the SE field [35], [36], [37], [38], [39], [40], [41], [42], [43]. Based on the deep structure, an effective representation of the noisy input signal can be extracted and used to reconstruct a clean signal [44], [45], [46], [47], [48], [49], [50]. Various DL-based model structures, including deep denoising autoencoders [51], [52], fully connected neural networks [53], [54], [55], convolutional neural networks (CNNs) [56], [57], recurrent neural networks (RNNs), and long short-term memory (LSTM) [58], [59], [60], [61], [62], [63], have been used as the core model of an SE system and have been proven to provide better performance than traditional statistical and machine-learning methods.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, deep learning (DL) has became a popular and effective machine learning algorithm [32], [33], [34] and has brought significant progress in the SE field [35], [36], [37], [38], [39], [40], [41], [42], [43]. Based on the deep structure, an effective representation of the noisy input signal can be extracted and used to reconstruct a clean signal [44], [45], [46], [47], [48], [49], [50]. Various DL-based model structures, including deep denoising autoencoders [51], [52], fully connected neural networks [53], [54], [55], convolutional neural networks (CNNs) [56], [57], recurrent neural networks (RNNs), and long short-term memory (LSTM) [58], [59], [60], [61], [62], [63], have been used as the core model of an SE system and have been proven to provide better performance than traditional statistical and machine-learning methods.…”
Section: Introductionmentioning
confidence: 99%
“…As an example, Turian and Henry have demonstrated that spectral measures of distance do not capture pitch differences, which is important for tasks such as tonal language recognition [13]. Another example of the limitations of spectral measures is that they can miss low-energy phonemes due to over-emphasis on energy differences [14].…”
Section: Introductionmentioning
confidence: 99%
“…1. Our approach uses a recognition model pre-trained with phoneme targets and clean input speech to generate a phonetic perceptual loss 1 for improved enhancement training [14]- [16]. This approach preserves modularity, and achieved state-of-the-art recognition scores on the CHiME-2 challenge [17] for systems using the default language model.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, deep learning (DL) has became a popular and effective machine learning algorithm [32,33,34] and has brought significant progress in the SE field [35,36,37,38,39,40,41,42,43]. Based on the deep structure, an effective representation of the noisy input signal can be extracted and used to reconstruct a clean signal [44,45,46,47,48,49,50]. Various DL-based model structures, including deep denoising autoencoders [51,52], fully connected neural networks [53,54,55], convolutional neural networks (CNNs) [56,57], recurrent neural networks (RNNs), and long short-term memory (LSTM) [58,59,60,61,62,63], have been used as the core model of an SE system and have been proven to provide better performance than traditional statistical and machine-learning methods.…”
Section: Introductionmentioning
confidence: 99%