This paper presents a new approach called dendogrambased support vector machines (DSVM), to treat multiclass problems. First, the method consists to build a taxonomy of classes in an ascendant manner done by ascendant hierarchical clustering method (AHC). Second, SVM is injected at each internal node of the taxonomy in order to separate the two subsets of the current node. Finally, for classifying a pattern query, we present it to the "root" SVM, and then, according to the output, the pattern is presented to one of the two SVMs of the subsets, and so on through the "leaf" nodes. Therefore, the classification procedure is done in a descendant way in the taxonomy from the root through the end level which represents the classes. The pattern is thus associated to one of the last SVMs associated class. AHC decomposition uses distance measures to investigate the class grouping in binary form at each level in the hierarchy. SVM method requires little tuning and yields both high accuracy levels and good generalization for binary classification. Therefore, DSVM method gives good results for multi-class problems by both, training an optimal number of SVMs and by rapidly classifying patterns in a descendant way by selecting an optimal set of SVMs which participate to the final decision. The proposed method is compared to other multi-class SVM methods over several complex problems.
Feature selection, semi-supervised learning and multi-label classification are different challenges for machine learning and data mining communities. While other works have addressed each of these problems separately, in this paper we show how they can be addressed together. We propose a unified framework for semi-supervised multi-label feature selection, based on Laplacian score. In particular, we show how to constrain the function of this score, when data are partially labeled and each instance is associated with a set of labels. We transform the labeled part of data into soft constraints and show how to integrate them in a measure of feature relevance, according to the available labels. Experiments on benchmark data sets are provided for validating the proposed approach and comparing it with some other state-of-the-art feature selection methods in a multi-label context.
International audienceIn this paper, we present a new SOM-based bi-clustering approach for continuous data. This approach is called Bi-SOM (for Bi-clustering based on Self-Organizing Map). The main goal of bi-clustering aims to simultaneously group the rows and columns of a given data matrix. In addition, we propose in this work to deal with some issues related to this task: (1) the topological visualization of bi-clusters with respect to their neighborhood relation, (2) the optimization of these bi-clusters in macro-blocks and (3) the dimensionality reduction by eliminating noise blocks, iteratively. Finally, experiments are given over several data sets for validating our approach in comparison with other bi-clustering methods
Dimensionality reduction is a significant task when dealing with high-dimensional data, this reduction can be done by feature selection, which means to select the most appropriate features for data analysis. It is a recent addressed challenge in feature selection research when handling small-labeled with largeunlabeled data sampled from the same population. The supervision information may be used in the form of pairwise constraints; these constraints have practically proven to have very positive effects on the learning performance. Nevertheless, selected constraints sets may have significant results (positive or negative) on learning performance. In this paper, we present a novel feature selection approach based on an efficient selection of pairwise constraints. This aims to grasp the most coherent constraints extracted from labeled party of data. We then evaluate the relevance of a feature according to its 'efficient' locality preserving and 'chosen' constraints preserving ability. Finally, experimental results will be provided for validating our proposal in comparison with other known feature selection methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.