Fast and scalable polynomial kernels via explicit feature maps

Pham, Ninh; Pagh, Rasmus

doi:10.1145/2487575.2487591

Cited by 277 publications

(244 citation statements)

References 20 publications

Supporting

Mentioning

242

Contrasting

Order By: Relevance

“…Many mappings have been proposed. Examples include random Fourier projection [83], random projections [84]- [87], polynomial approximation [88], and hashing [89]- [92]. They differ in various aspects, which are beyond the scope of this paper.…”

Section: B Approximation Of Kernel Methods Via Linear Classificationmentioning

confidence: 99%

Recent Advances of Large-Scale Linear Classification

2012

View full text Add to dashboard Cite

Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much faster. Recently, many research works have developed efficient optimization methods to construct linear classifiers and applied them to some large-scale applications. In this paper, we give a comprehensive survey on the recent development of this active research area.

show abstract

Section: B Approximation Of Kernel Methods Via Linear Classificationmentioning

confidence: 99%

Recent Advances of Large-Scale Linear Classification

2012

View full text Add to dashboard Cite

show abstract

“…While the kernel trick has been widely and successfully applied in large margin learning, the calculation of kernel matrices is a bottleneck of the kernel trick for large-scale data sets. In recent years, a lot of alternatives to the kernel trick have been proposed to reduce the computational and storage costs (see, e.g., [43][44][45][46]), which can approximate the induced feature mapping φ by a low dimensional functionφ (x) such that…”

Section: Preliminariesmentioning

confidence: 99%

Maximum margin clustering for state decomposition of metastable systems

2015

Neurocomputing

View full text Add to dashboard Cite

When studying a metastable dynamical system, a prime concern is how to decompose the phase space into a set of metastable states. Unfortunately, the metastable state decomposition based on simulation or experimental data is still a challenge. The most popular and simplest approach is geometric clustering which is developed based on the classical clustering technique. However, the prerequisites of this approach are: (1) data are obtained from simulations or experiments which are in global equilibrium and (2) the coordinate system is appropriately selected. Recently, the kinetic clustering approach based on phase space discretization and transition probability estimation has drawn much attention due to its applicability to more general cases, but the choice of discretization policy is a difficult task. In this paper, a new decomposition method designated as maximum margin metastable clustering is proposed, which converts the problem of metastable state decomposition to a semi-supervised learning problem so that the large margin technique can be utilized to search for the optimal decomposition without phase space discretization. Moreover, several simulation examples are given to illustrate the effectiveness of the proposed method.

show abstract

“…The recent fastfood algorithm [17] further speeds up RKS using matrix approximation techniques and reduces the time and space complexities. Other feature mapping techniques include those based on random projection [1,15,18,23], polynomial approximation [21], and hashing [19,32] Existing feature mapping techniques, when combined with linear classifiers, can achieve both nonlinear separability and higher scalability of linear classifiers. However, they cannot take advantage of the interpretability of linear classifiers.…”

Section: Related Workmentioning

confidence: 99%

Fast flux discriminant for large-scale sparse nonlinear classification

Chen

Weinberger

2014

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

In this paper, we propose a novel supervised learning method, Fast Flux Discriminant (FFD), for large-scale nonlinear classification. Compared with other existing methods, FFD has unmatched advantages, as it attains the efficiency and interpretability of linear models as well as the accuracy of nonlinear models. It is also sparse and naturally handles mixed data types. It works by decomposing the kernel density estimation in the entire feature space into selected lowdimensional subspaces. Since there are many possible subspaces, we propose a submodular optimization framework for subspace selection. The selected subspace predictions are then transformed to new features on which a linear model can be learned. Besides, since the transformed features naturally expect non-negative weights, we only require smooth optimization even with the 1 regularization. Unlike other nonlinear models such as kernel methods, the FFD model is interpretable as it gives importance weights on the original features. Its training and testing are also much faster than traditional kernel models. We carry out extensive empirical studies on real-world datasets and show that the proposed model achieves state-of-the-art classification results with sparsity, interpretability, and exceptional scalability. Our model can be learned in minutes on datasets with millions of samples, for which most existing nonlinear methods will be prohibitively expensive in space and time.

show abstract

Fast and scalable polynomial kernels via explicit feature maps

Cited by 277 publications

References 20 publications

Recent Advances of Large-Scale Linear Classification

Recent Advances of Large-Scale Linear Classification

Maximum margin clustering for state decomposition of metastable systems

Fast flux discriminant for large-scale sparse nonlinear classification

Contact Info

Product

Resources

About