Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

Ling, Zhen-Hua; Ai, Yang; Gu, Yu; Dai, Li-Rong

doi:10.1109/taslp.2018.2798811

Cited by 49 publications

(18 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Feng et al [6] used FFTNet [26] which resembles the classical FFT process. Ling et al [27] proposed a hierarchical RNN to utilize the waveform structures. Several other efforts incorporated time-frequency information while still operating in the time domain.…”

Section: Related Workmentioning

confidence: 99%

Bandwidth Extension is All You Need

Wang

Finkelstein

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Speech generation and enhancement have seen recent breakthroughs in quality thanks to deep learning. These methods typically operate at a limited sampling rate of 16-22kHz due to computational complexity and available datasets. This limitation imposes a gap between the output of such methods and that of high-fidelity (≥44kHz) real-world audio applications. This paper proposes a new bandwidth extension (BWE) method that expands 8-16kHz speech signals to 48kHz. The method is based on a feed-forward WaveNet architecture trained with a GAN-based deep feature loss. A mean-opinionscore (MOS) experiment shows significant improvement in quality over state-of-the-art BWE methods. An AB test reveals that our 16to-48kHz BWE is able to achieve fidelity that is typically indistinguishable from real high-fidelity recordings. We use our method to enhance the output of recent speech generation and denoising methods, and experiments demonstrate significant improvement in sound quality over these baselines. We propose this as a general approach to narrow the gap between generated speech and recorded speech, without the need to adapt such methods to higher sampling rates.

show abstract

Section: Related Workmentioning

confidence: 99%

Bandwidth Extension is All You Need

Wang

Finkelstein

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Different approaches for extension of excitation signal are presented in [2], [3]. Different techniques for estimating WIB spectral envelop are presented in [3][4][5][6][7]. However, traditional artificial bandwidth extension methods are suffering from reconstructing WIB speech with high quality under all conditions [8].…”

Section: Introductionmentioning

confidence: 99%

DWT-DCT-Based Data Hiding for Speech Bandwidth Extension

Koduri¹,

Kumar²

2021

RADIOENGINEERING

View full text Add to dashboard Cite

The limited narrowband frequency range, about 300-3400Hz, used in telephone network channels results in less intelligible and poor-quality telephony speech. To address this drawback, a novel robust speech bandwidth extension using Discrete Wavelet Transform-Discrete Cosine Transform Based Data Hiding is proposed. In this technique, the missing speech information is embedded in the narrowband speech signal. The embedded missing speech information is recovered steadily at the receiver end to generate a wideband speech of considerably better quality. The robustness of the proposed method to quantization and channel noises is confirmed by the mean square error test. The enhancement in the quality of reconstructed wideband speech of the proposed method over conventional methods is reasserted by subjective listening and objective tests.

show abstract

“…In [18] [19], Recurrent Neural Networks (RNNs) were introduced into the structure of the MPC, because they can capture the system dynamics and provide long-range predictions [20]. It is well-known that RNNs have issues with vanishing and exploding gradients, which makes their training difficult sometimes, therefore we propose to use a special form of RNN, i.e., the Long Short Term Memory (LSTM).…”

Section: Introductionmentioning

confidence: 99%

Deep Learning Based Model Predictive Control for a Reverse Osmosis Desalination Plant

Karimanzira¹,

Rauschenbach²

2020

JAMP

View full text Add to dashboard Cite

Reverse Osmosis (RO) desalination plants are highly nonlinear multi-input-multioutput systems that are affected by uncertainties, constraints and some physical phenomena such as membrane fouling that are mathematically difficult to describe. Such systems require effective control strategies that take these effects into account. Such a control strategy is the nonlinear model predictive (NMPC) controller. However, an NMPC depends very much on the accuracy of the internal model used for prediction in order to maintain feasible operating conditions of the RO desalination plant. Recurrent Neural Networks (RNNs), especially the Long-Short-Term Memory (LSTM) can capture complex nonlinear dynamic behavior and provide long-range predictions even in the presence of disturbances. Therefore, in this paper an NMPC for a RO desalination plant that utilizes an LSTM as the predictive model will be presented. It will be tested to maintain a given permeate flow rate and keep the permeate concentration under a certain limit by manipulating the feed pressure. Results show a good performance of the system.

show abstract

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

Cited by 49 publications

References 40 publications

Bandwidth Extension is All You Need

Bandwidth Extension is All You Need

DWT-DCT-Based Data Hiding for Speech Bandwidth Extension

Deep Learning Based Model Predictive Control for a Reverse Osmosis Desalination Plant

Contact Info

Product

Resources

About