The rising interest in pattern recognition and data analytics has spurred the development of innovative machine learning algorithms and tools. However, as each algorithm has its strengths and limitations, one is motivated to judiciously fuse multiple algorithms in order to find the "best" performing one, for a given dataset. Ensemble learning aims at such highperformance meta-algorithm, by combining the outputs from multiple algorithms. The present work introduces a blind scheme for learning from ensembles of classifiers, using a moment matching method that leverages joint tensor and matrix factorization. Blind refers to the combiner who has no knowledge of the groundtruth labels that each classifier has been trained on. A rigorous performance analysis is derived and the proposed scheme is evaluated on synthetic and real datasets.
The immense amount of daily generated and communicated data presents unique
challenges in their processing. Clustering, the grouping of data without the
presence of ground-truth labels, is an important tool for drawing inferences
from data. Subspace clustering (SC) is a relatively recent method that is able
to successfully classify nonlinearly separable data in a multitude of settings.
In spite of their high clustering accuracy, SC methods incur prohibitively high
computational complexity when processing large volumes of high-dimensional
data. Inspired by random sketching approaches for dimensionality reduction, the
present paper introduces a randomized scheme for SC, termed Sketch-SC, tailored
for large volumes of high-dimensional data. Sketch-SC accelerates the
computationally heavy parts of state-of-the-art SC approaches by compressing
the data matrix across both dimensions using random projections, thus enabling
fast and accurate large-scale SC. Performance analysis as well as extensive
numerical tests on real data corroborate the potential of Sketch-SC and its
competitive performance relative to state-of-the-art scalable SC approaches
The rising interest in pattern recognition and data analytics has spurred the development of a plethora of machine learning algorithms and tools. However, as each algorithm has its strengths and weaknesses, one is motivated to judiciously fuse multiple algorithms in order to find the "best" performing one, for a given dataset. Ensemble learning aims to create a highperformance meta-algorithm, by combining the outputs from multiple algorithms. The present work introduces a simple blind scheme for learning from ensembles of classifiers, using joint matrix factorization. Blind refers to the combiner who has no knowledge of the ground-truth labels that each classifier has been trained on. Performance is evaluated on synthetic and real datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.