Efficiently indexing shortest paths by exploiting symmetry in graphs

Xiao, Yanghua; Wu, Wentao; Pei, Jian; Wang, Wei; He, Zhenying

doi:10.1145/1516360.1516418

Cited by 59 publications

(47 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our paper addresses a different problem than the one in [20,21] since we are interested only on the length of a shortest path, not the path itself. Recently, Xiao et al have exploit graph symmetry to obtain speed-ups for PPSP queries over simple BFS traversals [45]. Our algorithms work only on the precomputed node-to-landmark-distances and do not perform any Dijkstra-type computation at query-time.…”

Section: Related Workmentioning

confidence: 99%

Fast shortest path distance estimation in large networks

Potamias

Bonchi

Castillo

et al. 2009

Proceedings of the 18th ACM Conference on Information and Knowledge Management

238

271

View full text Add to dashboard Cite

We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications.In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks.We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random.Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.

show abstract

Section: Related Workmentioning

confidence: 99%

Fast shortest path distance estimation in large networks

Potamias

Bonchi

Castillo

et al. 2009

Proceedings of the 18th ACM Conference on Information and Knowledge Management

238

271

View full text Add to dashboard Cite

show abstract

“…Naturally, how two nodes are related to each other reflects the topology of the graph G. Path-length based definitions, such as those proposed by [7,16,34,42,44,45] help capture the relatedness of a pair of nodes solely based on the properties of the nodes and edges on the shortest path between the pair. [12] and [13] were among the first works which recognized that random-walks can also be used for measuring the significance of the graph nodes relative to a given seed node set, S ⊆ V : authors observed that, if one constructs a random-walk graph such that transition probabilities represent the separation between the seed nodes in the graph then the random-walk would spend more time on nodes that are closer to the seed nodes in S. More specifically, in [12] the authors proposed to construct a transition matrix, T S , where edges leading away from the seed nodes are weighted less than those edges leading towards the seed nodes.…”

Section: Context-sensitive Node Significance and Personalizationmentioning

confidence: 99%

Reducing seed noise in personalized PageRank

Huang

Candan

et al. 2016

Soc. Netw. Anal. Min.

View full text Add to dashboard Cite

Network based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random-walk based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

show abstract

“…Note that localities can be distance-constrained or size-constrained. Common definitions include h-hop neighborhoods (Boldi et al, 2011;Cohen et al, 2003;Wei, 2010;Xiao et al, 2009;Zhou et al, 2009), reachability neighborhoods (Cohen et al, 2003), cluster/partition neighborhoods (Feige et al, 2005;Karypis and Kumar, 1998;Newman, 2006), or hitting distance neighborhoods (Chen et al, 2008;Mei et al, 2008). One straight-forward way to identify the locality of a seed node n is to perform breadth-first search around n to locate the closest L nodes in linear time to the size of the locality.…”

Section: Locality Selectionmentioning

confidence: 99%

“…Due to the wide-spread use of graphs in analysis, mining, and visualization of interconnected data, there are many definitions of the node distance and proximity. Path-length based definitions, such as those used by Palmer et al (2006), Boldi et al (2011), Cohen et al (2003), Wei (2010), Xiao et al (2009), Zhou et al (2009) , are 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 useful when the relatedness can be captured solely based on the properties of the nodes and edges on the shortest path (based on some definition of path-length). Randomwalk based definitions, such as hitting distance (Chen et al, 2008;Mei et al, 2008) and personalized PageRank (PPR) score (Balmin et al, 2004;Chakrabarti, 2007;Jeh and Widom, 2002;Tong et al, 2006a;Tong et al, 2007;Liu et al, 2013;Lofgren et al, 2014;Maehara et al, 2014), of node relatedness, on the other hand, also take into account the density of the edges: intuitively, as in path-length based definitions, a node can be said to be...…”

Section: Introductionmentioning

confidence: 99%

Locality-sensitive and Re-use Promoting Personalized PageRank computations

2015

View full text Add to dashboard Cite

Abstract:Node distance/proximity measures are used for quantifying how nearby or otherwise related two or more nodes on a graph are. In particular, personalized PageRank (PPR) based measures of node proximity have been shown to be highly effective in many prediction and recommendation applications. Despite its effectiveness, however, the use of personalized PageRank for large graphs is difficult due to its high computation cost. In this paper, we propose a Locality-sensitive, Re-use promoting, approximate Personalized PageRank (LR-PPR) algorithm for efficiently computing the PPR values relying on the localities of the given seed nodes on the graph: (a) The LR-PPR algorithm is locality sensitive in the sense that it reduces the computational cost of the PPR computation process by focusing on the local neighborhoods of the seed nodes.(b) LR-PPR is re-use promoting in that instead of performing a monolithic computation for the given seed node set using the entire graph, LR-PPR divides the work into localities of the seeds and caches the intermediary results obtained during the computation. These cached results are then reused for future queries sharing seed nodes. Experiment results for different data sets and under different scenarios show that LR-PPR algorithm is highly-efficient and accurate. Abstract. Node distance/proximity measures are used for quantifying how nearby or otherwise related two or more nodes on a graph are. In particular, personalized PageRank (PPR) based measures of node proximity have been shown to be highly effective in many prediction and recommendation applications. Despite its effectiveness, however, the use of personalized PageRank for large graphs is difficult due to its high computation cost. In this paper, we propose a Localitysensitive, Re-use promoting, approximate Personalized PageRank (LR-PPR) algorithm for efficiently computing the PPR values relying on the localities of the given seed nodes on the graph: (a) The LR-PPR algorithm is locality sensitive in the sense that it reduces the computational cost of the PPR computation process by focusing on the local neighborhoods of the seed nodes. (b) LR-PPR is re-use promoting in that instead of performing a monolithic computation for the given seed node set using the entire graph, LR-PPR divides the work into localities of the seeds and caches the intermediary results obtained during the computation. These cached results are then reused for future queries sharing seed nodes. Experiment results for different data sets and under different scenarios show that LR-PPR algorithm is highly-efficient and accurate. Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation

show abstract

Efficiently indexing shortest paths by exploiting symmetry in graphs

Cited by 59 publications

References 28 publications

Fast shortest path distance estimation in large networks

Fast shortest path distance estimation in large networks

Reducing seed noise in personalized PageRank

Locality-sensitive and Re-use Promoting Personalized PageRank computations

Contact Info

Product

Resources

About