Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-522
|View full text |Cite
|
Sign up to set email alerts
|

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

Abstract: Acoustic models based on long short-term memory recurrent neural networks (LSTM-RNNs) were applied to statistical parametric speech synthesis (SPSS) and showed significant improvements in naturalness and latency over those based on hidden Markov models (HMMs). This paper describes further optimizations of LSTM-RNN-based SPSS for deployment on mobile devices; weight quantization, multi-frame inference, and robust inference using an -contaminated Gaussian loss function. Experimental results in subjective listeni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
49
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 81 publications
(50 citation statements)
references
References 34 publications
0
49
0
1
Order By: Relevance
“…The hyperparameters σ g , λ ga , λ cp were 0.4, 10,000, and 10, respectively. The batch size, number of epochs, and reduction factor [49] were 32, 1,000 and 5. We used the Adam optimizer [50] and varied the learning rate over the course of training [51].…”
Section: Methodsmentioning
confidence: 99%
“…The hyperparameters σ g , λ ga , λ cp were 0.4, 10,000, and 10, respectively. The batch size, number of epochs, and reduction factor [49] were 32, 1,000 and 5. We used the Adam optimizer [50] and varied the learning rate over the course of training [51].…”
Section: Methodsmentioning
confidence: 99%
“…In this section, we look at how to tailor deep learning to mobile networking applications from three perspectives, namely, mobile devices and systems, distributed data centers, and changing mobile network environments. [513] Filter size shrinking, reducing input channels and late downsampling CNN Howard et al [514] Depth-wise separable convolution CNN Zhang et al [515] Point-wise group convolution and channel shuffle CNN Zhang et al [516] Tucker decomposition AE Cao et al [517] Data parallelization by RenderScript RNN Chen et al [518] Space exploration for data reusability and kernel redundancy removal CNN Rallapalli et al [519] Memory optimizations CNN Lane et al [520] Runtime layer compression and deep architecture decomposition MLP, CNN Huynh et al [521] Caching, Tucker decomposition and computation offloading CNN Wu et al [522] Parameters quantization CNN Bhattacharya and Lane [523] Sparsification of fully-connected layers and separation of convolutional kernels MLP, CNN Georgiev et al [97] Representation sharing MLP Cho and Brand [524] Convolution operation optimization CNN Guo and Potkonjak [525] Filters and classes pruning CNN Li et al [526] Cloud assistance and incremental learning CNN Zen et al [527] Weight quantization LSTM Falcao et al [528] Parallelization and memory sharing Stacked AE Fang et al [529] Model pruning and recovery scheme CNN Xu et al [530] Reusable region lookup and reusable region propagation scheme CNN…”
Section: Tailoring Deep Learning To Mobile Networkmentioning
confidence: 99%
“…Beyond these works, researchers also successfully adapt deep learning architectures through other designs and sophisticated optimizations, such as parameters quantization [522], [527], sparsification and separation [523], representation and memory sharing [97], [528], convolution operation optimization [524], pruning [525], cloud assistance [526] and compiler optimization [532]. These techniques will be of great significance when embedding deep neural networks into mobile systems.…”
Section: A Tailoring Deep Learning To Mobile Devices and Systemsmentioning
confidence: 99%
“…Another direction focuses on efficient storage and representation of weights. Various techniques, such as weight sharing within Toeplitz matrices [19], weight tying through effective hashing [20], and appropriate weight quantization [21][22][23], can greatly reduce model size, in some cases at the expense of a slight performance degradation.…”
Section: Related Workmentioning
confidence: 99%