Subspace Metric Ensembles for Semi-supervised Clustering of High Dimensional Data

Yan, Bojun; Domeniconi, Carlotta

doi:10.1007/11871842_48

Cited by 9 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RSM is based on random sampling for original feature components to obtain different feature subsets. In recent years, it has been applied to FS, clustering, and other areas. − When it is used for FS, it often finds the optimal result by evaluating a predefined number of features. It is a crucial step in determining the dimensions of the subspace in RSM.…”

Section: The Proposed Methodsmentioning

confidence: 99%

A Novel Integrated Feature Selection Method for the Rational Synthesis of Microporous Aluminophosphate

Wang

et al. 2012

Ind. Eng. Chem. Res.

View full text Add to dashboard Cite

In this paper, an integrated feature selection model is proposed to explore the relationship between the synthetic factors and the specific resulting structure on the database of AlPO syntheses. Specifically, the proposed model can select the most significant synthetic factors for the generation of (6,12)-ring-containing structure. First, a random subspace method is employed to prerank the synthetic factors based on the predictive performance of a support vector machine. Then, the Fisher score is presented to rank the synthetic factors for getting a fusion weight. Finally, a sequential forward search method is utilized to select the most significant synthetic factors in view of the highest predictive performance. Specially, the principal-component-analysis method is adopted as guidance for estimating the random subspace dimension. The results are judged on the numerical prediction of (6,12)-ring-containing AlPO structures. Simultaneously, we compare our method with several classical feature selection methods. The experimental results show that the proposed model exhibits higher predictive accuracy with less synthetic factors. The results also provide an important guidance for the rational design and synthesis of microporous materials.

show abstract

Section: The Proposed Methodsmentioning

confidence: 99%

A Novel Integrated Feature Selection Method for the Rational Synthesis of Microporous Aluminophosphate

Wang

et al. 2012

Ind. Eng. Chem. Res.

View full text Add to dashboard Cite

show abstract

“…There are even approaches that connect this idea with the idea of having constraints (see Sect. 2.3) that can guide the distance-learning (Yan and Domeniconi 2006). Let us note that, for this general approach of learning one (combined) result based on several representations, strong connections to ensemble clustering (Sect.…”

Section: Multiview Clusteringmentioning

confidence: 99%

The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives

Zimek

Vreeken

2013

Mach Learn

View full text Add to dashboard Cite

In this position paper, we discuss how different branches of research on clustering and pattern mining, while rather different at first glance, in fact have a lot in common and can learn a lot from each other's solutions and approaches. We give brief introductions to the fundamental problems of different sub-fields of clustering, especially focusing on subspace clustering, ensemble clustering, alternative (as a variant of constraint) clustering, and multiview clustering (as a variant of alternative clustering). Second, we relate a representative of these areas, subspace clustering, to pattern mining. We show that, while these areas use different vocabularies and intuitions, they share common roots and they are exposed to essentially the same fundamental problems; in particular, we detail how certain problems currently faced by the one field, have been solved by the other field, and vice versa.The purpose of our survey is to take first steps towards bridging the linguistic gap between different (sub-) communities and to make researchers from different fields aware of the existence of similar problems (and, partly, of similar solutions or of solutions that could be transferred) in the literature on the other research topic.

show abstract

“…Furthermore, learning an effective full rank distance metric by using constraints in highdimensional spaces is impracticable since (a) the number of parameters to be estimated is the square of the dimensionality, and (b) typically insufficient side information is available in order to obtain accu-rate estimates. A typical solution to this problem is to reduce the dimensionality and to modify the distance metric in the reduced space, as in (Yan and Domeniconi, 2006). However, important information may be lost during a completely unsupervised dimension reduction (that does not use the side information) which may degrade the subsequent metric learning.…”

Section: Introductionmentioning

confidence: 99%

Semi-Supervised Dimensionality Reduction Using Pairwise Equivalence Constraints

Çevikalp¹,

Verbeek²,

Jurie³

et al. 2008

Proceedings of the Third International Conference on Computer Vision Theory and Applications

View full text Add to dashboard Cite

To deal with the problem of insufficient labeled data, usually side information-given in the form of pairwise equivalence constraints between points-is used to discover groups within data. However, existing methods using side information typically fail in cases with high-dimensional spaces. In this paper, we address the problem of learning from side information for high-dimensional data. To this end, we propose a semi-supervised dimensionality reduction scheme that incorporates pairwise equivalence constraints for finding a better embedding space, which improves the performance of subsequent clustering and classification phases. Our method builds on the assumption that points in a sufficiently small neighborhood tend to have the same label. Equivalence constraints are employed to modify the neighborhoods and to increase the separability of different classes. Experimental results on high-dimensional image data sets show that integrating side information into the dimensionality reduction improves the clustering and classification performance.

show abstract

Subspace Metric Ensembles for Semi-supervised Clustering of High Dimensional Data

Cited by 9 publications

References 11 publications

A Novel Integrated Feature Selection Method for the Rational Synthesis of Microporous Aluminophosphate

A Novel Integrated Feature Selection Method for the Rational Synthesis of Microporous Aluminophosphate

The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives

Semi-Supervised Dimensionality Reduction Using Pairwise Equivalence Constraints

Contact Info

Product

Resources

About