The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2018
DOI: 10.1109/taslp.2018.2798811
|View full text |Cite
|
Sign up to set email alerts
|

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

Abstract: This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods which predict spectral parameters for reconstructing wideband speech waveforms, this BWE method models and predicts waveform samples directly without using vocoders. Inspired by SampleRNN which is an unconditional neural audio generator, the HRNN model represents the distribution of each wideband or high-frequency wavefor… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 49 publications
(18 citation statements)
references
References 40 publications
0
18
0
Order By: Relevance
“…Feng et al [6] used FFTNet [26] which resembles the classical FFT process. Ling et al [27] proposed a hierarchical RNN to utilize the waveform structures. Several other efforts incorporated time-frequency information while still operating in the time domain.…”
Section: Related Workmentioning
confidence: 99%
“…Feng et al [6] used FFTNet [26] which resembles the classical FFT process. Ling et al [27] proposed a hierarchical RNN to utilize the waveform structures. Several other efforts incorporated time-frequency information while still operating in the time domain.…”
Section: Related Workmentioning
confidence: 99%
“…Different approaches for extension of excitation signal are presented in [2], [3]. Different techniques for estimating WIB spectral envelop are presented in [3][4][5][6][7]. However, traditional artificial bandwidth extension methods are suffering from reconstructing WIB speech with high quality under all conditions [8].…”
Section: Introductionmentioning
confidence: 99%
“…In [18] [19], Recurrent Neural Networks (RNNs) were introduced into the structure of the MPC, because they can capture the system dynamics and provide long-range predictions [20]. It is well-known that RNNs have issues with vanishing and exploding gradients, which makes their training difficult sometimes, therefore we propose to use a special form of RNN, i.e., the Long Short Term Memory (LSTM).…”
Section: Introductionmentioning
confidence: 99%