Subspace clustering methods based on 1 , 2 or nuclear norm regularization have become very popular due to their simplicity, theoretical guarantees and empirical success. However, the choice of the regularizer can greatly impact both theory and practice. For instance, 1 regularization is guaranteed to give a subspace-preserving affinity (i.e., there are no connections between points from different subspaces) under broad conditions (e.g., arbitrary subspaces and corrupted data). However, it requires solving a large scale convex optimization problem. On the other hand, 2 and nuclear norm regularization provide efficient closed form solutions, but require very strong assumptions to guarantee a subspace-preserving affinity, e.g., independent subspaces and uncorrupted data. In this paper we study a subspace clustering method based on orthogonal matching pursuit. We show that the method is both computationally efficient and guaranteed to give a subspace-preserving affinity under broad conditions. Experiments on synthetic data verify our theoretical analysis, and applications in handwritten digit and face clustering show that our approach achieves the best trade off between accuracy and efficiency.
Kinases remain an important drug target class within the pharmaceutical industry; however, the rational design of kinase inhibitors is plagued by the complexity of gaining selectivity for a small number of proteins within a family of more than 500 related enzymes. Herein we show how a computational method for identifying the location and thermodynamic properties of water molecules within a protein binding site can yield insight into previously inexplicable selectivity and structure-activity relationships. Four kinase systems (Src family, Abl/c-Kit, Syk/ZAP-70, and CDK2/4) were investigated, and differences in predicted water molecule locations and energetics were able to explain the experimentally observed binding selectivity profiles. The successful predictions across the range of kinases studied here suggest that this methodology could be generally applicable for predicting selectivity profiles in related targets.
Self-organizing molecular field analysis (SOMFA) is a novel technique for three-dimensional quantitative structure-activity relations (3D-QSAR). It is simple and intuitive in concept and avoids the complex statistical tools and variable selection procedures favored by other methods. Our calculations show the method to be as predictive as the best 3D-QSAR methods available. Importantly, steric and electrostatic maps can be produced to aid the molecular design process by highlighting important molecular features. The simplicity of the technique leaves scope for further development, particularly with regard to handling molecular alignment and conformation selection. Here, the method has been used to predict the corticosteroid-binding globulin binding affinity of the "benchmark" steroids, expanded from the usual 31 compounds to 43 compounds. Test predictions have also been performed on a set of sulfonamide endothelin inhibitors.
Many computer vision tasks involve processing large amounts of data contaminated by outliers, which need to be detected and rejected. While outlier detection methods based on robust statistics have existed for decades, only recently have methods based on sparse and low-rank representation been developed along with guarantees of correct outlier detection when the inliers lie in one or more lowdimensional subspaces. This paper proposes a new outlier detection method that combines tools from sparse representation with random walks on a graph. By exploiting the property that data points can be expressed as sparse linear combinations of each other, we obtain an asymmetric affinity matrix among data points, which we use to construct a weighted directed graph. By defining a suitable Markov Chain from this graph, we establish a connection between inliers/outliers and essential/inessential states of the Markov chain, which allows us to detect outliers by using random walks. We provide a theoretical analysis that justifies the correctness of our method under geometric and connectivity assumptions. Experimental results on image databases demonstrate its superiority with respect to stateof-the-art sparse and low-rank outlier detection methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.