Soft Learning Vector Quantization

Seo, Sambu; Obermayer, Klaus

doi:10.1162/089976603321891819

Cited by 148 publications

(130 citation statements)

References 4 publications

Supporting

Mentioning

129

Contrasting

Order By: Relevance

“…Generalized LVQ (GLVQ) [3] e.g., is based on a heuristic cost function which can be related to a maximization of the hypothesis margin of the classifier. Mathematically wellfounded alternatives were proposed in [4] and [14]: the cost functions of Soft LVQ and Robust Soft LVQ are based on a statistical modelling of the data distribution by a mixture of Gaussians, and training aims at optimizing the likelihood.…”

Section: Matrix Learning In Lvqmentioning

confidence: 99%

Regularization in Matrix Relevance Learning

Schneider

Bunte

Stiekema

et al. 2010

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

Abstract-We present a regularization technique to extend recently proposed matrix learning schemes in learning vector quantization (LVQ). These learning algorithms extend the concept of adaptive distance measures in LVQ to the use of relevance matrices. In general, metric learning can display a tendency towards over-simplification in the course of training. An overly pronounced elimination of dimensions in feature space can have negative effects on the performance and may lead to instabilities in the training. We focus on matrix learning in Generalized LVQ. Extending the cost function by an appropriate regularization term prevents the unfavorable behavior and can help to improve the generalization ability. The approach is first tested and illustrated in terms of artificial model data. Furthermore, we apply the scheme to benchmark classification data sets from the UCI Repository of machine learning. We demonstrate the usefulness of regularization also in the case of rank limited relevance matrices, i.e. matrix learning with an implicit, low dimensional representation of the data.

show abstract

Section: Matrix Learning In Lvqmentioning

confidence: 99%

Regularization in Matrix Relevance Learning

Schneider

Bunte

Stiekema

et al. 2010

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

show abstract

“…The RSLVQ algorithm [5] optimizes a log-likelihood loss under a framework of Gaussian mixture density:…”

Section: Nearest Prototype Classifiermentioning

confidence: 99%

“…High classification accuracies can be achieved by the NPC with a small number of prototypes when properly selected. The so-far proposed prototype learning methods can be grouped into ones based on heuristic adjustment (such as learning vector quantization (LVQ) [1]) and ones optimizing an objective function (such as [2,3,4,5,6]). These methods, optimizing a multi-class objective in training, do not consider outliers, which exist prevalently in applications of image recognition and document analysis, such as character string recognition [7], keyword retrieval [8] and transcript mapping [9].…”

Section: Introductionmentioning

confidence: 99%

One-Vs-All Training of Prototype Classifier for Pattern Classification and Retrieval

Liu

2010

2010 20th International Conference on Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…If the width σ goes to zero, the assignment probabilities become hard assignments (winner-takes-all case) [5]. Smaller widths (σ → 0) and increasing distance to the cluster (d ij → ∞) are equivalent for this behavior of the Gaussian function: when σ approaches 0 the entire exponent d 2 ij /2σ 2 approaches ∞ just as the exponents in the equation above.…”

Section: B Influence Of the Similarity Functionmentioning

confidence: 99%

“…High numbers of irrelevant attributes should not cause clusters to get attracted or to coincide when f gauss is used. The abovementioned behavior of the Gaussian function can also be observed in soft learning vector quantization (SLVQ) when a Gaussian mixture approach is used [5]. In SLVQ the (un-normalized) assignment probabilities of a data point x j to prototype i are given by exp(d…”

Section: B Influence Of the Similarity Functionmentioning

confidence: 99%

Effects of Irrelevant Attributes in Fuzzy Clustering

Döring

Borgelt

Kruse

The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05.

View full text Add to dashboard Cite

Abstract-In fuzzy clustering soft cluster partitions are formed based on the similarity of data points to the respective cluster prototypes. Similarity is defined in terms of simultaneous closeness regarding all attributes. In some applications the values of many attributes have been measured, but a natural clustering, if it exists, occurs within a (small) subset of attributes. The remaining dimensions can be considered irrelevant. They can obscure an existing grouping and make it harder to discover the cluster structure. In probabilistic fuzzy clustering irrelevant attributes can lead to coincidental cluster centers in the worst case. We study this effect in detail as well as the robustness of different similarity functions and their possible parameterizations against irrelevant input dimensions. Empirical evidence is given for the different properties of the membership functions. I. FUZZY CLUSTERINGMost fuzzy clustering algorithms are objective function based: they determine an optimal (fuzzy) partition of a given data set X = { x j | j = 1, . . . , n} into clusters by minimizing an objective function(1) subject to the constraintsfor all i ∈ {1, . . . , c}, and (2)Here u ij ∈ [0, 1] is the membership degree of datum x j to cluster i and d ij is the distance between datum x j and cluster i. The c × n matrix U = (u ij ) is called the fuzzy partition matrix and C describes the set of clusters by stating location parameters (i.e. the cluster center) and maybe size and shape parameters for each cluster. The parameter m, m > 1, is called the fuzzifier or weighting exponent. It determines the "fuzziness" of the classification: with higher values for m the boundaries between the clusters become softer, with lower values they get harder. Usually m = 2 is chosen. Constraint (2) guarantees that no cluster is empty. Constraint (3) ensures that the membership degrees of a datum to the clusters sum up to 1 and thus that each datum has the same total influence. Because of the second constraint this approach is usually called probabilistic fuzzy clustering, since with it the membership degrees for a datum formally resemble the probabilities of its being a member of the corresponding clusters. The partitioning property of a probabilistic clustering algorithm, which "distributes" the weight of a datum to the different clusters, is due to this constraint.Unfortunately, the objective function J cannot be minimized directly. Therefore an iterative algorithm is used, which alternately optimizes the membership degrees and the cluster parameters. That is, first the membership degrees are optimized for fixed cluster parameters, then the cluster parameters are optimized for fixed membership degrees. The main advantage of this scheme is that in each of the two steps the optimum can be computed directly. By iterating the two steps the joint optimum is approached (although it cannot be guaranteed that the global optimum will be reached-the algorithm may get stuck in a local minimum of the objective function J).The update formulae are deri...

show abstract

Soft Learning Vector Quantization

Cited by 148 publications

References 4 publications

Regularization in Matrix Relevance Learning

Regularization in Matrix Relevance Learning

One-Vs-All Training of Prototype Classifier for Pattern Classification and Retrieval

Effects of Irrelevant Attributes in Fuzzy Clustering

Contact Info

Product

Resources

About