Expanding network communities from representative examples

Mehler, Andrew; Skiena, Steven

doi:10.1145/1514888.1514890

Cited by 22 publications

(15 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Bagrow [4] did this for a measure called outwardness, defined as the degree-normalized difference between neighbors inside and outside the community. Mehler & Skiena [13] used several variations of neighbor counting methods for seeded community detection, the main ones being pure neighbor count, neighbor ratio, and binomial probability of neighbor distribution. More recently in 2013 Weber et al used another variation of a neighbor-counting metric to infer the political ideology of Twitter users, based on which community a user retweeted more frequently.…”

Section: Related Workmentioning

confidence: 99%

“…(a) Outwardness, the degree-normalized difference between the number of edges a node has within and without of the labeled community [4]; (b) Neighbors, the number of neighbors one has in the labeled community [13]; (c) DN-Neighbors, the degree-normalized version of Neighbors [13]; (d) BinomProb, the binomial probability that a node is in the community, given the number of neighbors it has in the labeled community [13].…”

Section: Appendixmentioning

confidence: 99%

See 1 more Smart Citation

Community membership identification from small seed sets

Kloumann

Kleinberg

2014

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

114

View full text Add to dashboard Cite

In many applications we have a social network of people and would like to identify the members of an interesting but unlabeled group or community. We start with a small number of exemplar group members -they may be followers of a political ideology or fans of a music genre -and need to use those examples to discover the additional members. This problem gives rise to the seed expansion problem in community detection: given example community members, how can the social graph be used to predict the identities of remaining, hidden community members? In contrast with global community detection (graph partitioning or covering), seed expansion is best suited for identifying communities locally concentrated around nodes of interest. A growing body of work has used seed expansion as a scalable means of detecting overlapping communities. Yet despite growing interest in seed expansion, there are divergent approaches in the literature and there still isn't a systematic understanding of which approaches work best in different domains.Here we evaluate several variants and uncover subtle trade-offs between different approaches. We explore which properties of the seed set can improve performance, focusing on heuristics that one can control in practice. As a consequence of this systematic understanding we have found several opportunities for performance gains. We also consider an adaptive version in which requests are made for additional membership labels of particular nodes, such as one finds in field studies of social communities. This leads to interesting connections and contrasts with active learning and the trade-offs of exploration and exploitation. Finally, we explore topological properties of communities and seed sets that correlate with algorithm performance, and explain these empirical observations with theoretical ones.We evaluate our methods across multiple domains, using publicly available datasets with labeled, ground-truth communities.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Appendixmentioning

confidence: 99%

Community membership identification from small seed sets

Kloumann

Kleinberg

2014

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

114

View full text Add to dashboard Cite

show abstract

“…In this case, we needed to extract fairly accurate lists of names labeled by ethnicity, but more generally we would like to produce lists of entities corresponding to members of any natural group [19].…”

Section: Extracting Name Lists From Wikipediamentioning

confidence: 99%

Name-ethnicity classification from open sources

Ambekar

Ward

Mohammed

et al. 2009

Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

213

185

View full text Add to dashboard Cite

The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.

show abstract

“…An active line of recent work has focused on the problem of "seed set expansion" in networks (5)(6)(7)(8)(9)(10)(11), a fundamental version of node ranking with the following natural definition.…”

mentioning

confidence: 99%

Block models and personalized PageRank

Kloumann

Ugander

Kleinberg

2016

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods through the "seed set expansion problem": given a subset S of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the observation that the most widely used techniques for this problem, personalized PageRank and heat kernel methods, operate in the space of "landing probabilities" of a random walk rooted at the seed set, ranking nodes according to weighted sums of landing probabilities of different length walks. Both schemes, however, lack an a priori relationship to the seed set objective. In this work, we develop a principled framework for evaluating ranking methods by studying seed set expansion applied to the stochastic block model. We derive the optimal gradient for separating the landing probabilities of two classes in a stochastic block model and find, surprisingly, that under reasonable assumptions the gradient is asymptotically equivalent to personalized PageRank for a specific choice of the PageRank parameter α that depends on the block model parameters. This connection provides a formal motivation for the success of personalized PageRank in seed set expansion and node ranking generally. We use this connection to propose more advanced techniques incorporating higher moments of landing probabilities; our advanced methods exhibit greatly improved performance, despite being simple linear classification rules, and are even competitive with belief propagation.PageRank | stochastic block models | seed set expansion T he challenge of contextually ranking nodes in a network has emerged as a problem of canonical significance in many domains, with a particularly rich history of study in social and information networks (1-4). An active line of recent work has focused on the problem of "seed set expansion" in networks (5-11), a fundamental version of node ranking with the following natural definition.In the seed set expansion problem, we are given a graph G representing some form of social or information network, and there is a hidden community of interest that we would like to find, corresponding to an internally well-connected set of nodes. We know a small subset S of the nodes in this community, and from this "seed set" S , we would like to expand outward to find the rest of the community-by ordering the rest of the nodes outside S according to some ranking criterion and proposing nodes in this order as additional members of the community. This problem arises in a wide range of domains, including settings where we are trying to find web pages that are related to a set of examples, to identify a social group from a set of sample members provided by a domain expert, or to help a user automatically populate a group they are defining in an online social-networking application.A recent focus in the work on this problem has been the power of approaches based on ran...

show abstract

Expanding network communities from representative examples

Cited by 22 publications

References 29 publications

Community membership identification from small seed sets

Community membership identification from small seed sets

Name-ethnicity classification from open sources

Block models and personalized PageRank

Contact Info

Product

Resources

About