BackgroundSet comparisons permeate a large number of data analysis workflows, in particular workflows in biological sciences. Venn diagrams are frequently employed for such analysis but current tools are limited.ResultsWe have developed InteractiVenn, a more flexible tool for interacting with Venn diagrams including up to six sets. It offers a clean interface for Venn diagram construction and enables analysis of set unions while preserving the shape of the diagram. Set unions are useful to reveal differences and similarities among sets and may be guided in our tool by a tree or by a list of set unions. The tool also allows obtaining subsets’ elements, saving and loading sets for further analyses, and exporting the diagram in vector and image formats. InteractiVenn has been used to analyze two biological datasets, but it may serve set analysis in a broad range of domains.ConclusionsInteractiVenn allows set unions in Venn diagrams to be explored thoroughly, by consequence extending the ability to analyze combinations of sets with additional observations, yielded by novel interactions between joined sets. InteractiVenn is freely available online at: www.interactivenn.net.
The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analysis of various kinds. This paper presents a novel multidimensional projection technique based on least square approximations. The approximations compute the coordinates of a set of projected points based on the coordinates of a reduced number of control points with defined geometry. We name the technique Least Square Projections (LSP). From an initial projection of the control points, LSP defines the positioning of their neighboring points through a numerical solution that aims at preserving a similarity relationship between the points given by a metric in mD. In order to perform the projection, a small number of distance calculations is necessary and no repositioning of the points is required to obtain a final solution with satisfactory precision. The results show the capability of the technique to form groups of points by degree of similarity in 2D. We illustrate that capability through its application to mapping collections of textual documents from varied sources, a strategic yet difficult application. LSP is faster and more accurate than other existing high quality methods, particularly where it was mostly tested, that is, for mapping text sets.
Projection (or dimensionality reduction) techniques have been used as a means to handling the growing dimensionality of data sets as well as providing a way to visualize information coded into point relationships. Their role is essential in data interpretation and simultaneous use of different projections and their visualizations improve data understanding and increase the level of confidence in the result. For that purpose, projections should be fast to allow multiple views of the same data set. In this work we present a novel fast technique for projecting multi-dimensional data sets into bidimensional (2D) spaces that preserves neighborhood relationships. Additionally, a new technique for improving 2D projections from multi-dimensional data is presented, that helps reduce the inherent loss of information yielded by dimensionality reduction. The results are stimulating and are presented in the form of comparative visualizations against known and new 2D projection techniques. Based on the projection improvement approach presented here, a new metric for quality of projection is also given, that matches well the visual perception of quality. We discuss the implication of using improved projections in visual exploration of large data sets and the role of interaction in visualization of projected subspaces.
Different regions of oral squamous cell carcinoma (OSCC) have particular histopathological and molecular characteristics limiting the standard tumor−node−metastasis prognosis classification. Therefore, defining biological signatures that allow assessing the prognostic outcomes for OSCC patients would be of great clinical significance. Using histopathology-guided discovery proteomics, we analyze neoplastic islands and stroma from the invasive tumor front (ITF) and inner tumor to identify differentially expressed proteins. Potential signature proteins are prioritized and further investigated by immunohistochemistry (IHC) and targeted proteomics. IHC indicates low expression of cystatin-B in neoplastic islands from the ITF as an independent marker for local recurrence. Targeted proteomics analysis of the prioritized proteins in saliva, combined with machine-learning methods, highlights a peptide-based signature as the most powerful predictor to distinguish patients with and without lymph node metastasis. In summary, we identify a robust signature, which may enhance prognostic decisions in OSCC and better guide treatment to reduce tumor recurrence or lymph node metastasis.
Multidimensional projections map data points, defined in a high-dimensional data space, into a 1D, 2D or 3D representation space. Such a mapping may be typically achieved with dimensional reduction, clustering, or force directed point placement. Projections can be displayed and navigated by data analysts by means of visual representations, which may vary from points on a plane to graphs, surfaces or volumes. Typically, projections strive to preserve distance relationships amongst data points, as defined in the original space. Information loss is inevitable and the projection approach defines the extent to which the distance preserving goal is attained. We introduce PEx -the Projection Explorer -a visualization tool for mapping and exploration of high-dimensional data via projections. A set of examples -on both structured (table) and unstructured (text) data -illustrate how projection based visualizations, coupled with appropriate exploration tools, offer a flexible set-up for multidimensional data exploration. The projections in PEx handle relatively large data sets at a computational cost adequate to user interaction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.