A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

Rebrova, Elizaveta; Chávez, Gustavo; Liu, Yang; Ghysels, Pieter; Li, Xiaoye Sherry

doi:10.1109/ipdpsw.2018.00140

Cited by 16 publications

(10 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While we presented a preliminary analysis of the multiplicative update scheme's convergence behavior for a special case, future work is necessary for a thorough determination of the algorithm's region of convergence. There is also strong practical interest in the adaptation of the method to GPR extensions such as those based on non-Gaussian likelihoods and Nystr öm [36,37] or hierarchical low-rank approximations [38,39,40], as well as in Table 1: Label Noise -Rate/Level: the percentage of corrupted labels and the ratio between the noise and the standard deviation of the pristine labels; R 2 : the coefficient of determination between the inferred and actual label noise; AUC: area under the ROC curve of a 'noisy label' classifier that thresholds the learned σ i ; Precision at Recall Level: precision of the classifier at specified recall levels. Regression accuracy -plain/basic/full: Σ = 0, σI, diag (σ), respectively.…”

Section: Discussionmentioning

confidence: 99%

Detecting Label Noise via Leave-One-Out Cross-Validation

Tang¹,

Zhu²,

Jong³

2021

Preprint

View full text Add to dashboard Cite

We present a simple algorithm for identifying and correcting real-valued noisy labels from a mixture of clean and corrupted sample points using Gaussian process regression. A heteroscedastic noise model is employed, in which additive Gaussian noise terms with independent variances are associated with each and all of the observed labels. Optimizing the noise model using maximum likelihood estimation leads to the containment of the GPR model's predictive error by the posterior standard deviation in leave-one-out cross-validation. A multiplicative update scheme is proposed for solving the maximum likelihood estimation problem under non-negative constraints. While we provide a proof of convergence for certain special cases, the multiplicative scheme has empirically demonstrated monotonic convergence behavior in virtually all our numerical experiments. We show that the presented method can pinpoint corrupted sample points and lead to better regression models when trained on synthetic and real-world scientific data sets.

show abstract

Section: Discussionmentioning

confidence: 99%

Detecting Label Noise via Leave-One-Out Cross-Validation

Tang¹,

Zhu²,

Jong³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Morton orderings and other space filling curves have also been used to generate tilings for matrices in low spatial dimensions [24,17]. In higher dimensional spaces, such as the feature spaces that appear in machine learning applications, approximate nearest neighbor [20,38,35] are computed based on random projection trees. These are generalizations of KD-trees, where the direction of the median split is randomized and is not one of the coordinate dimensions.…”

Section: Related Workmentioning

confidence: 99%

H2OPUS-TLR: High Performance Tile Low Rank Symmetric Factorizations using Adaptive Randomized Approximation

Boukaram¹,

Zampini²,

Turkiyyah³

et al. 2021

Preprint

View full text Add to dashboard Cite

Tile low rank (TLR) representations of dense matrices partition them into blocks of roughly uniform size, where each off-diagonal tile is compressed and stored as its own low rank factorization. They offer an attractive representation for many data-sparse dense operators that appear in practical applications, where substantial compression and a much smaller memory footprint can be achieved. TLR matrices are a compromise between the simplicity of a regular perfectly-strided data structure and the optimal complexity of the unbalanced trees of hierarchically low rank matrices, and provide a convenient performance-tuning parameter through their tile size that can be proportioned to take into account the cache size where the tiles reside in the memory hierarchy.Despite their utility however, there are currently no high performance algorithms that can generate their Cholesky and LDL T factorizations and operate on them efficiently, particularly on GPUs. The difficulties in achieving high performance when factoring TLR matrices come from the expensive compression operations that must be performed during the factorization process and the adaptive rank distribution of the tiles that causes an irregular work pattern for the processing cores. In this work, we develop a dynamic batching operation and combine it with batched adaptive randomized approximations to remedy these difficulties and achieve high performance both on GPUs and CPUs.Our implementation attains over 1.2 TFLOP/s in double precision on the V100 GPU, and is limited primarily by the underlying performance of batched GEMM operations. The time-to-solution also shows substantial speedup compared to regular dense factorizations. The Cholesky factorization of covariance matrix of size N = 131K arising in 2D or 3D spatial statistics, for example, can be factored to an accuracy = 10 −2 in just a few seconds. We believe the proposed GEMM-centric algorithm allows it to be readily ported to newer hardware such as the tensor cores that are optimized for small GEMM operations.

show abstract

“…Aim of this work is to propose and analyse the use of the Hierarchically Semi-Separable (HSS) matrix representation [6] for the solution of large scale kernel SVMs. Indeed, the use of HSS approximations of kernel matrices has been already investigated in [9,33] for the solution of large scale Kernel Regression problems. The main reason for the choice of the HSS structure in this context can be summarised as follows: • using the STRUctured Matrix PACKage (STRUMPACK) [34] it is possible to obtain HSS approximations of the kernel matrices without the need to store/compute explicitly the whole matrix K. Indeed, for kernel matrix approximations, STRUMPACK uses a partially matrix-free strategy (see [9]) essentially based on an adaptive randomized sampling which requires only a black-box matrixtimes-vector multiplication routine and the access to selected elements from the kernel matrix;…”

Section: Contributionmentioning

confidence: 99%

Training very large scale nonlinear SVMs using Alternating Direction Method of Multipliers coupled with the Hierarchically Semi-Separable kernel approximations

Cipolla¹,

Gondzio²

2021

Preprint

View full text Add to dashboard Cite

Typically, nonlinear Support Vector Machines (SVMs) produce significantly higher classification quality when compared to linear ones but, at the same time, their computational complexity is prohibitive for large-scale datasets: this drawback is essentially related to the necessity to store and manipulate large, dense and unstructured kernel matrices. Despite the fact that at the core of training a SVM there is a simple convex optimization problem, the presence of kernel matrices is responsible for dramatic performance reduction, making SVMs unworkably slow for large problems. Aiming to an efficient solution of large-scale nonlinear SVM problems, we propose the use of the Alternating Direction Method of Multipliers coupled with Hierarchically Semi-Separable (HSS) kernel approximations. As shown in this work, the detailed analysis of the interaction among their algorithmic components unveils a particularly efficient framework and indeed, the presented experimental results demonstrate a significant speed-up when compared to the state-of-the-art nonlinear SVM libraries (without significantly affecting the classification accuracy).

show abstract

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

Cited by 16 publications

References 30 publications

Detecting Label Noise via Leave-One-Out Cross-Validation

Detecting Label Noise via Leave-One-Out Cross-Validation

H2OPUS-TLR: High Performance Tile Low Rank Symmetric Factorizations using Adaptive Randomized Approximation

Training very large scale nonlinear SVMs using Alternating Direction Method of Multipliers coupled with the Hierarchically Semi-Separable kernel approximations

Contact Info

Product

Resources

About