We present a technique for clustering categorical data by generating many dissimilarity matrices and averaging over them. We begin by demonstrating our technique on low dimensional categorical data and comparing it to several other techniques that have been proposed. Then we give conditions under which our method should yield good results in general. Our method extends to high dimensional categorical data of equal lengths by ensembling over many choices of explanatory variables. In this context we compare our method with two other methods. Finally, we extend our method to high dimensional categorical data vectors of unequal length by using alignment techniques to equalize the lengths. We give examples to show that our method continues to provide good results, in particular, better in the context of genome sequences than clusterings suggested by phylogenetic trees.
Here, we propose a clustering technique for general clustering problems including those that have non-convex clusters. For a given desired number of clusters K, we use three stages to find a clustering. The first stage uses a hybrid clustering technique to produce a series of clusterings of various sizes (randomly selected). They key steps are to find a K-means clustering using K clusters where K K and then joins these small clusters by using single linkage clustering. The second stage stabilizes the result of stage one by reclustering via the 'membership matrix' under Hamming distance to generate a dendrogram. The third stage is to cut the dendrogram to get K * clusters where K * ≥ K and then prune back to K to give a final clustering. A variant on our technique also gives a reasonable estimate for K T , the true number of clusters.We provide a series of arguments to justify the steps in the stages of our methods and we provide numerous examples involving real and simulated data to compare our technique with other related techniques.
Intelligent robots frequently need to explore the objects in their working environments. Modern sensors have enabled robots to learn object properties via perception of multiple modalities. However, object exploration in the real world poses a challenging trade-off between information gains and exploration action costs. Mixed observability Markov decision process (MOMDP) is a framework for planning under uncertainty, while accounting for both fully and partially observable components of the state. Robot perception frequently has to face such mixed observability. This work enables a robot equipped with an arm to dynamically construct query-oriented MOMDPs for multi-modal predicate identification (MPI) of objects. The robot's behavioral policy is learned from two datasets collected using real robots. Our approach enables a robot to explore object properties in a way that is significantly faster while improving accuracies in comparison to existing methods that rely on handcoded exploration strategies.
In order to monitor articles/patents in nanotechnology, there is little agreement on a universal lexical query or even an explicit definition of nanotechnology. Here in the light of a proposed definition, a set of case studies has been conducted to remove keywords which are not exclusive to nanotechnology. This resulted in a collective and abridged lexical query (CALQ) for nanotechnology delineation. Through bibliometric quantification of already-proposed as well as the novel keywords, it was shown that all keywords included in CALQ have considerable exclusive retrieval and precision, while the removed keywords do not satisfy either of these numerical thresholds. This approach may also be applied for the future updating of CALQ.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.