ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413439
|View full text |Cite
|
Sign up to set email alerts
|

Real-Time Speech Frequency Bandwidth Extension

Abstract: In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(22 citation statements)
references
References 15 publications
(21 reference statements)
0
22
0
Order By: Relevance
“…While deep learning based audio super resolution methods have been proven effective, deploying such solutions to a resource-limited embedded system has not been fully investigated. Similar to our proposal, several super resolution deep learning methods [25,28,32] have proven the feasibility of applying the super resolution method on a smartphone. Other state-of-the-art speech super resolution models require considerable computation resources and cause significant latency, which is not suitable for edge device deployment.…”
Section: Audio Super Resolution Techniquesmentioning
confidence: 67%
See 1 more Smart Citation
“…While deep learning based audio super resolution methods have been proven effective, deploying such solutions to a resource-limited embedded system has not been fully investigated. Similar to our proposal, several super resolution deep learning methods [25,28,32] have proven the feasibility of applying the super resolution method on a smartphone. Other state-of-the-art speech super resolution models require considerable computation resources and cause significant latency, which is not suitable for edge device deployment.…”
Section: Audio Super Resolution Techniquesmentioning
confidence: 67%
“…Adversarial learning is another popular training technique. In this technique, a discriminator that works either in the time domain [15,19,28] or frequency domain [11,22,27] guides the generator to predict more realistic high-resolution audio from low-resolution inputs.…”
Section: Audio Super Resolution Techniquesmentioning
confidence: 99%
“…The model architecture is based on the decoder described in [27,28], which is a real-time streaming-capable version of MelGAN [29]. Its structure with parameters is shown in Table 1.…”
Section: Neural Network Based Spectrogram Inversionmentioning
confidence: 99%
“…Although only a few studies have applied GAN models to bandwidth extension of music signals [10], [39], many recent works have applied them for speech [14], [13], [40]. Eskimez et al [40] proposed one of the earliest works using an adversarial approach for speech super-resolution.…”
Section: B Gans For Audio Bandwidth Extensionmentioning
confidence: 99%
“…During recent years, many works have used modern deep learning technologies for bandwidth extension, but usually with the final goal of increasing the sampling rate of modern digital audio signals. Only a few exceptions are relevant to music signal processing [8], [9], [10], whereas most of these studies focus on processing speech [11], [12], [13], [14], [15].…”
Section: Introductionmentioning
confidence: 99%