Figure 1: The housing dataset for Kings County displayed using Taxonomizer. The user interface consists of six coordinated views. (a) The semantic space. (b) The data space. (c) The hierarchy built from combining information spaces (a) and (b). (d) The cophenetic correlation plot which allows users to specify the weight of (a) and (b) to generate (c). (e) The control panel gives the user various options to manipulate the structure of the hierarchy. (f) The word suggestion panel gives suggestions for labeling the nodes of the hierarchy.Abstract-Organizing multivariate data spaces by their dimensions or attributes can be a rather difficult task. Most of the work in this area focuses on the statistical aspects such as correlation clustering, dimension reduction, and the like. These methods typically produce hierarchies in which the leaf nodes are labeled by the attribute names while the inner nodes are often represented by just a statistical measure and criterion, such as a threshold. This makes them difficult to understand for mainstream users. Taxonomies in science, biology, engineering, etc. on the other hand, are easy to comprehend since they provide meaningful labels at the inner nodes as well. Labeling inner nodes of taxonomies automatically requires the identification of hypernyms. Our proposed framework, called Taxonomizer, takes a visual analytics approach to meet this challenge. It appeals to the wisdom of humans to liaise with state of the art data analytics, neural word embeddings, and lexical databases. It consists of a set of visual tools that starts out with an automatically computed hierarchy where the leaf nodes are the original data attributes, and it then allows users to sculpt high-quality taxonomies for any multivariate dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.