Novel non-parametric dimensionality reduction techniques such as t-distributed stochastic neighbor embedding (t-SNE) lead to a powerful and flexible visualization of high-dimensional data. One drawback of non-parametric techniques is their lack of an explicit out-of-sample extension. In this contribution, we propose an efficient extension of t-SNE to a parametric framework, kernel t-SNE, which preserves the flexibility of basic t-SNE, but enables explicit out-of-sample extensions. We test the ability of kernel t-SNE in comparison to standard t-SNE for benchmark data sets, in particular addressing the generalization ability of the mapping for novel data. In the context of large data sets, this procedure enables us to train a mapping for a fixed size subset only, mapping all data afterwards in linear time. We demonstrate that this technique yields satisfactory results also for large data sets provided missing information due to the small size of the subset is accounted for by auxiliary information such as class labels, which can be integrated into kernel t-SNE based on the Fisher information.
Abstract.Many different evaluation measures for dimensionality reduction can be summarized based on the co-ranking framework [6]. Here, we extend this framework in two ways: (i) we show that the current parameterization of the quality shows unpredictable behavior, even in simple settings, and we propose a different parameterization which yields more intuitive results; (ii) we propose how to link the quality to point-wise quality measures which can directly be integrated into the visualization.
In this overview, commonly used dimensionality reduction techniques for data visualization and their properties are reviewed. Thereby, the focus lies on an intuitive understanding of the underlying mathematical principles rather than detailed algorithmic pipelines. Important mathematical properties of the technologies are summarized in the tabular form. The behavior of representative techniques is demonstrated for three benchmarks, followed by a short discussion on how to quantitatively evaluate these mappings. In addition, three currently active research topics are addressed: how to devise dimensionality reduction techniques for complex non-vectorial data sets, how to easily shape dimensionality reduction techniques according to the users preferences, and how to device models that are suited for big data sets.Dimensionality reduction in its original form addresses the projection of high-dimensional vectors to a low-dimensional space. Often, however, data are not given in the form of vectors, but as pairwise relations between data points. Alternatively, data can possess additional structural elements such as a time dynamics, or as an underlying graph structure that can be captured by natural dissimilarity measures such as alignment.There has been quite some effort to develop dimensionality reduction techniques for structures such as graph structures or time series, see, e.g., Ref 69 for a very promising graph drawing approach developed in the context of machine learning, or the
Albeit automated classifiers offer a standard tool in many application areas, there exists hardly a generic possibility to directly inspect their behavior, which goes beyond the mere classification of (sets of) data points. In this contribution, we propose a general framework how to visualize a given classifier and its behavior as concerns a given data set in two dimensions. More specifically, we use modern nonlinear dimensionality reduction (DR) techniques to project a given set of data points and their relation to the classification decision boundaries. Furthermore, since data are usually intrinsically more than two-dimensional and hence cannot be projected to two dimensions without information loss, we propose to use discriminative DR methods which shape the projection according to given class labeling as is the case for a classification setting. With a given data set, this framework can be used to visualize any trained classifier which provides a probability or certainty of the classification together with the predicted class label.We demonstrate the suitability of the framework in the context of different dimensionality reduction techniques, in the context of different attention foci as concerns the visualization, and as concerns different classifiers which should be visualized.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.