2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6638949
|View full text |Cite
|
Sign up to set email alerts
|

Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets

Abstract: While Deep Neural Networks (DNNs) have achieved tremendous success for large vocabulary continuous speech recognition (LVCSR) tasks, training of these networks is slow. One reason is that DNNs are trained with a large number of training parameters (i.e., 10-50 million). Because networks are trained with a large number of output targets to achieve good performance, the majority of these parameters are in the final weight layer. In this paper, we propose a low-rank matrix factorization of the final weight layer.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
259
0
3

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 490 publications
(279 citation statements)
references
References 9 publications
2
259
0
3
Order By: Relevance
“…test ER (%) Gaussian mixture model (GMM) [23] 26.3 SVM [23] 22.4 Hierarchical GMM [22] 21.0 Discriminative hierarchical GMM [24] 16.8 SVM with deep scattering spectrum [25] 15.9 our CNN ensemble 15.0 the softmax [21]. Each frame is labeled with its segment label and one additional label from a neighboring segment.…”
Section: Other Expensive Featuresmentioning
confidence: 99%
“…test ER (%) Gaussian mixture model (GMM) [23] 26.3 SVM [23] 22.4 Hierarchical GMM [22] 21.0 Discriminative hierarchical GMM [24] 16.8 SVM with deep scattering spectrum [25] 15.9 our CNN ensemble 15.0 the softmax [21]. Each frame is labeled with its segment label and one additional label from a neighboring segment.…”
Section: Other Expensive Featuresmentioning
confidence: 99%
“…On the other hand, all layers can be compressed using Xue et al's approach, which accelerates recognition, but makes training even more expensive. The value of the linear bottleneck structure as a regularization method has not been identified in [15,16]. Our experiments show that using MN-SGD allows for a factorization of all layers when training from scratch.…”
Section: Improving Generalization Performancementioning
confidence: 77%
“…Recently, a more sophisticated approach for reducing the size of DNNs has been proposed. Sainath et al [15] factored the weight matrices into the product of two smaller matrices. This is equivalent with inserting a linear bottleneck between two layers of the network.…”
Section: Improving Generalization Performancementioning
confidence: 99%
See 1 more Smart Citation
“…A sparse A has computational benefits like low storage and computational complexity. Consequently, this work could be useful in sparse lowrank matrix factorization which has numerous applications in machine learning including learning [7] and deep neural networks (deep learning) [8] and autoencoding. This work is also related to optimizing projection matrices introduced in [9].…”
Section: Introductionmentioning
confidence: 99%