The Mean and Median Criteria for Kernel Bandwidth Selection for Support Vector Data Description

Gathering labeled data to train well-performing machine learning models is one of the critical challenges in many applications. Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling resources. In this article, we propose a decision-theoretic selection strategy that (1) directly optimizes the gain in misclassification error, and (2) uses a Bayesian approach by introducing a conjugate prior distribution to determine the class posterior to deal with uncertainties. By reformulating existing selection strategies within our proposed model, we can explain which aspects are not covered in current state-of-the-art and why this leads to the superior performance of our approach. Extensive experiments on a large variety of datasets and different kernels validate our claims.

show abstract

“…We set the bandwidth of the kernel ( = 1∕(2s 2 ) ) according to the mean criterion proposed by Chaudhuri et al (2017) with p = 1:…”

Section: Methodsmentioning

confidence: 99%

Toward optimal probabilistic active learning using a Bayesian approach

et al. 2021

View full text Add to dashboard Cite

show abstract

“…To perform the bagging in S.B, we set the number K to 3. For the classifier, we use the Parzen Window Classifier as described in Chapelle (2005) using a Gaussian kernel and the mean-bandwidth heuristic as proposed in Chaudhuri et al (2017). The classifier is trained using a sliding window spanning the last 500 instances, i.e., w = 500 .…”

Section: Design Of Experimentsmentioning

confidence: 99%

Stream-based active learning for sliding windows under the influence of verification latency

et al. 2021

View full text Add to dashboard Cite

Stream-based active learning (AL) strategies minimize the labeling effort by querying labels that improve the classifier’s performance the most. So far, these strategies neglect the fact that an oracle or expert requires time to provide a queried label. We show that existing AL methods deteriorate or even fail under the influence of such verification latency. The problem with these methods is that they estimate a label’s utility on the currently available labeled data. However, when this label would arrive, some of the current data may have gotten outdated and new labels have arrived. In this article, we propose to simulate the available data at the time when the label would arrive. Therefore, our method Forgetting and Simulating (FS) forgets outdated information and simulates the delayed labels to get more realistic utility estimates. We assume to know the label’s arrival date a priori and the classifier’s training data to be bounded by a sliding window. Our extensive experiments show that FS improves stream-based AL strategies in settings with both, constant and variable verification latency.

show abstract

“…[54] refer the median of the pairwise Euclidean distances as a common choice. However, [15] demonstrate that the use of the average and the median of Euclidean distances to estimate σ produce similar clustering results for the majority of situations. They justify the use of the average distances by its simplicity and fast computation even when the dataset is large.…”

Section: Choice Of Kernel: the Radial Basis Function Kernelmentioning

confidence: 79%

Determinantal consensus clustering

Vicente¹,

Murua²

2021

Preprint

View full text Add to dashboard Cite

Random restart of a given algorithm produces many partitions to yield a consensus clustering. Ensemble methods such as consensus clustering have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPP and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points within subsets. So, subsets with more similar points have less chances of being generated than subsets with very distinct points. The current and most popular sampling technique is sampling center points uniformly at random. We show through extensive simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. These two properties of DPP are key to make DPPs achieve good performance with small ensembles. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperform classical algorithms such as k-medoids and k-means consensus clusterings which are based on uniform random sampling of center points. KeywordsClassification • kernel-based validation index • Mercer kernel • partitioning about medoids • radial basis function • repulsion • Voronoi diagram

show abstract

The Mean and Median Criteria for Kernel Bandwidth Selection for Support Vector Data Description

Cited by 29 publications

References 10 publications

Toward optimal probabilistic active learning using a Bayesian approach

Toward optimal probabilistic active learning using a Bayesian approach

Stream-based active learning for sliding windows under the influence of verification latency

Determinantal consensus clustering

Contact Info

Product

Resources

About