Maxim Rakhuba scite author profile

We propose a simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning. Given a layer, we use non-linear least squares to compute a low-rank CP-decomposition of the 4D convolution kernel tensor into a sum of a small number of rank-one tensors. At the second step, this decomposition is used to replace the original convolutional layer with a sequence of four convolutional layers with small kernels. After such replacement, the entire network is fine-tuned on the training data using standard backpropagation process. We evaluate this approach on two CNNs and show that it is competitive with previous approaches, leading to higher obtained CPU speedups at the cost of lower accuracy drops for the smaller of the two networks. Thus, for the 36-class character classification CNN, our approach obtains a 8.5x CPU speedup of the whole network with only minor accuracy drop (1% from 91% to 90%). For the standard ImageNet architecture (AlexNet), the approach speeds up the second convolution layer by a factor of 4x at the cost of 1% increase of the overall top-5 classification error.

show abstract

Calculating vibrational spectra of molecules using tensor train decomposition

Rakhuba

Oseledets

2016

View full text Add to dashboard Cite

We propose a new algorithm for calculation of vibrational spectra of molecules using tensor train decomposition. Under the assumption that eigenfunctions lie on a low-parametric manifold of low-rank tensors we suggest using well-known iterative methods that utilize matrix inversion (locally optimal block preconditioned conjugate gradient method, inverse iteration) and solve corresponding linear systems inexactly along this manifold. As an application, we accurately compute vibrational spectra (84 states) of acetonitrile molecule CHCN on a laptop in one hour using only 100 MB of memory to represent all computed eigenfunctions.

show abstract

Fast Multidimensional Convolution in Low-Rank Tensor Formats via Cross Approximation

Rakhuba¹,

Oseledets²

2015

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

We propose new cross-conv algorithm for approximate computation of convolution in different low-rank tensor formats (tensor train, Tucker, Hierarchical Tucker). It has better complexity with respect to the tensor rank than previous approaches. The new algorithm has a high potential impact in different applications. The key idea is based on applying cross approximation in the "frequency domain", where convolution becomes a simple elementwise product. We illustrate efficiency of our algorithm by computing the three-dimensional Newton potential and by presenting preliminary results for solution of the Hartree-Fock equation on tensor-product grids. CANDECOMP/PARAFAC model) which dates back to 1927 [28]. A tensor is said to be in canonical format if it can be represented in the formwhere the minimal possible r is called canonical rank. If a good CP approximation is known, many basic operations are fast to compute [5,35,25,26,39,50,6].Nevertheless, the CP decomposition suffers from a serious drawback: there are no robust algorithms to compute it numerically for d > 2 [12]. Note that in two dimensions it can be computed in a stable way by using SVD or, if the matrix is large, by rank-revealing algorithms.The Tucker format [61,10,11,39] is another classic decomposition of tensors. It can be computed via stable algorithms but the number of parameters grows exponentially in d. As a result, it is typically used only for problems with small d, especially for the three-dimensional case. In higher dimensions other stable tensor formats, namely tensor train (TT) [51,47] or hierarchical Tucker (HT) [27,19] formats can be used. In contrast with the Tucker format, they do not suffer from the "curse of dimensionality". For more details regarding low-rank representations of tensors see the book by Hackbusch [24] and reviews [38,21,43].Related work. In this paper we focus on fast computation of multidimensional convolution. Although it is not difficult to implement convolution in complexity linear in d or n, a strong rank dependence may occur. The rank of the result is generally equal to the product of the ranks of f , g, and then one should truncate the representation with necessary accuracy (by truncation we mean approximation in the same format with smaller rank). This approach was considered in [58,36] and may lead to high complexity when the ranks are large. A remarkable work is [29] where an algorithm for the computation of convolution in so-called Quantized TT (QTT) [37,45] was proposed. This algorithm has complexity O(d log α n) and is asymptotically the best one. However, for n of practical interest the algorithm proposed in this paper is faster for the same discretization and approximation accuracy ε. This is due to high constant hidden in O(·) term in the QTT algorithm.The algorithm proposed in this paper is simple. At first, we use a classic idea of representing discrete convolution in the form of several Fourier transforms and one element-wise multiplication in the "frequency domain". The crucial step is to interpolate t...

show abstract

QTT-finite-element approximation for multiscale problems I: model problems in one dimension

et al. 2016

View full text Add to dashboard Cite

Alternating Least Squares as Moving Subspace Correction

Oseledets¹,

Rakhuba²,

Uschmajew³

2018

SIAM J. Numer. Anal.

View full text Add to dashboard Cite

In this note we take a new look at the local convergence of alternating optimization methods for low-rank matrices and tensors. Our abstract interpretation as sequential optimization on moving subspaces yields insightful reformulations of some known convergence conditions that focus on the interplay between the contractivity of classical multiplicative Schwarz methods with overlapping subspaces and the curvature of low-rank matrix and tensor manifolds. While the verification of the abstract conditions in concrete scenarios remains open in most cases, we are able to provide an alternative and conceptually simple derivation of the asymptotic convergence rate of the two-sided block power method of numerical algebra for computing the dominant singular subspaces of a rectangular matrix. This method is equivalent to an alternating least squares method applied to a distance function. The theoretical results are illustrated and validated by numerical experiments.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maxim Rakhuba

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

Calculating vibrational spectra of molecules using tensor train decomposition

Fast Multidimensional Convolution in Low-Rank Tensor Formats via Cross Approximation

QTT-finite-element approximation for multiscale problems I: model problems in one dimension

Alternating Least Squares as Moving Subspace Correction

Contact Info

Product

Resources

About