Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The stochastic blockmodel [Social Networks 5 (1983) 109--137] is a social network model with well-defined communities; each node is a member of one community. For a network generated from the Stochastic Blockmodel, we bound the number of nodes "misclustered" by spectral clustering. The asymptotic results in this paper are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name high-dimensional. In order to study spectral clustering under the stochastic blockmodel, we first show that under the more general latent space model, the eigenvectors of the normalized graph Laplacian asymptotically converge to the eigenvectors of a "population" normalized graph Laplacian. Aside from the implication for spectral clustering, this provides insight into a graph visualization technique. Our method of studying the eigenvectors of random matrices is original.Comment: Published in at http://dx.doi.org/10.1214/11-AOS887 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
1. The detection of evolutionary shifts in trait evolution from extant taxa is motivated by the study of convergent evolution, or to correlate shifts in traits with habitat changes or with changes in other phenotypes. 2. We propose here a phylogenetic lasso method to study trait evolution from comparative data and detect past changes in the expected mean trait values. We use the Ornstein-Uhlenbeck process, which can model a changing adaptive landscape over time and over lineages. 3. Our method is very fast, running in minutes for hundreds of species, and can handle multiple traits. We also propose a phylogenetic Bayesian information criterion that accounts for the phylogenetic correlation between species, as well as for the complexity of estimating an unknown number of shifts at unknown locations in the phylogeny. This criterion does not suffer model overfitting and has high precision, so it offers a conservative alternative to other information criteria. 4. Our re-analysis of Anolis lizard data suggests a more conservative scenario of morphological adaptation and convergence than previously proposed. Software is available on GitHub.
SummaryBiological and social systems consist of myriad interacting units. The interactions can be represented in the form of a graph or network. Measurements of these graphs can reveal the underlying structure of these interactions, which provides insight into the systems that generated the graphs. Moreover, in applications such as connectomics, social networks, and genomics, graph data are accompanied by contextualizing measures on each node. We utilize these node covariates to help uncover latent communities in a graph, using a modification of spectral clustering. Statistical guarantees are provided under a joint mixture model that we call the node-contextualized stochastic blockmodel, including a bound on the misclustering rate. The bound is used to derive conditions for achieving perfect clustering. For most simulated cases, covariate-assisted spectral clustering yields results superior both to regularized spectral clustering without node covariates and to an adaptation of canonical correlation analysis. We apply our clustering method to large brain graphs derived from diffusion MRI data, using the node locations or neurological region membership as covariates. In both cases, covariate-assisted spectral clustering yields clusters that are easier to interpret neurologically.
In directed graphs, relationships are asymmetric and these asymmetries contain essential structural information about the graph. Directed relationships lead to a new type of clustering that is not feasible in undirected graphs. We propose a spectral co-clustering algorithm called DI-SIM for asymmetry discovery and directional clustering. A Stochastic co-Blockmodel is introduced to show favorable properties of DI-SIM. To account for the sparse and highly heterogeneous nature of directed networks, DI-SIM uses the regularized graph Laplacian and projects the rows of the eigenvector matrix onto the sphere. A nodewise ASYMMETRY SCORE and DI-SIM are used to analyze the clustering asymmetries in the networks of Enron emails, political blogs, and the Caenorhabditis elegans chemical connectome. In each example, a subset of nodes have clustering asymmetries; these nodes send edges to one cluster, but receive edges from another cluster. Such nodes yield insightful information (e.g., communication bottlenecks) about directed networks, but are missed if the analysis ignores edge direction.C lustering is widely used to study the structure of social, biological, and technological networks because it provides an aggregated and simplified representation of the complex interactions. The difficulty of the clustering problem has inspired an extensive literature devoted to the statistical and computational issues. Spectral approximation algorithms have become popular due to their computational speed and empirical performance across domain areas.In the clustering literature, the vast majority of the models and algorithms presumes that the interactions are symmetric or undirected. In some settings, the relationships can be well approximated as symmetric. However, asymmetric or directed relationships more fully represent the vast majority of interactions. For example, in the gene regulatory network, one gene drives the transcription of the other gene. In the power grid network, electricity flows from one node to the other. In a communication network, one node initiates the conversation. In other examples, it might be easier to observe the relationship without direction, but the direction remains of fundamental importance. For example, in a social network, a business searching for "trend leaders" wants to know the direction of influence in relationships, which is not directly observable. In a regulatory network, knockout experiments seek to estimate the direction of gene regulation. For many questions of interest, making the edges undirected does not provide an appropriate approximation. In all of these examples, the direction of the edges is essential to the function of the network. Directionality gives asymmetry to a relationship and the standard notion of clustering is insufficient to explore and appropriately aggregate asymmetric relationships in our data examples.To extend clustering to directed networks, we use Hartigan's notion of co-clustering, which he proposed as a way to simultaneously cluster both the rows and the columns o...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.