2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6853587
|View full text |Cite
|
Sign up to set email alerts
|

Kernel methods match Deep Neural Networks on TIMIT

Abstract: Despite their theoretical appeal and grounding in tractable convex optimization techniques, kernel methods are often not the first choice for large-scale speech applications due to their significant memory requirements and computational expense. In recent years, randomized approximate feature maps have emerged as an elegant mechanism to scale-up kernel methods. Still, in practice, a large number of random features is required to obtain acceptable accuracy in predictive tasks. In this paper, we develop two algo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
78
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 87 publications
(79 citation statements)
references
References 9 publications
1
78
0
Order By: Relevance
“…Kernel-based shallow models (which can be interpreted as two-layer neural networks with a fixed first layer), were also proposed to deal with speech tasks. In particular, [8] gave a kernel ridge regression method, which matched DNN on TIMIT. Inspired by this work, [9] applied an efficient onevs-one kernel ridge regression for speech recognition.…”
Section: Introductionmentioning
confidence: 99%
“…Kernel-based shallow models (which can be interpreted as two-layer neural networks with a fixed first layer), were also proposed to deal with speech tasks. In particular, [8] gave a kernel ridge regression method, which matched DNN on TIMIT. Inspired by this work, [9] applied an efficient onevs-one kernel ridge regression for speech recognition.…”
Section: Introductionmentioning
confidence: 99%
“…These semigroup kernels are particularly well-suited to data representations in the form of non-negative attributes and histograms. The scalability of this approach can be further improved via design of specialized parallel solvers [12] to handle a larger number of random features, while replacing Monte Carlo approximations with more efficient numerical integration techniques [22]. We plan to investigate a broader family of semigroup kernels on R d + and benchmark their performance across several applications.…”
Section: Discussionmentioning
confidence: 99%
“…In recent years, approximations to kernel functions via explicit low-dimensional feature maps [18,21,15,17,12] have emerged as an appealing strategy to turn the complexity of learning nonlinear kernel methods back to that of training linear models, which typically scale linearly in the number of data points in a variety of settings such as regression, classification [13] and principal component analysis. Importantly, storage requirements and testtime prediction speed can also be dramatically improved.…”
Section: Introductionmentioning
confidence: 99%
“…The induced dynamical system is slow to evaluate and integrate at inference time. Random feature approximations to kernel functions have been extensively used to scale up training complexity and inference speed of kernel methods [9,25] in a number of applications. The quality of approximation can be explicitly controlled by the number of random features.…”
Section: B4 Approximation Via Random Matrix Featuresmentioning
confidence: 99%