We study the complexity of local graph centrality estimation, with the goal of approximating the centrality score of a given target node while exploring only a sublinear number of nodes/arcs of the graph and performing a sublinear number of elementary operations. We develop a technique, that we apply to the PageRank and Heat Kernel centralities, for building a low-variance score estimator through a local exploration of the graph. We obtain an algorithm that, given any node in any graph of m arcs, with probability (1 − δ) computes a multiplicative (1±ǫ)-approximation of its score by examining onlyÕ min m 2/3 ∆ 1/3 d −2/3 , m 4/5 d −3/5 nodes/arcs, where ∆ and d are respectively the maximum and average outdegree of the graph (omitting for readability poly(ǫ −1 ) and polylog(δ −1 ) factors). A similar bound holds for computational complexity. We also prove a lower bound of Ω min m 1/2 ∆ 1/2 d −1/2 , m 2/3 d −1/3 for both query complexity and computational complexity. Moreover, our technique yields aÕ(n 2/3 ) query complexity algorithm for the graph access model of Brautbar et al. [14], widely used in social network mining; we show this algorithm is optimal up to a sublogarithmic factor. These are the first algorithms yielding worst-case sublinear bounds for general directed graphs and any choice of the target node. * This is the full version of a paper accepted for publication at IEEE FOCS 2018.
IntroductionComputing graph centralities efficiently is essential to modern network analysis. With the advent of web and social networks, the prototypical scenario involves massive graphs on millions or even billions of nodes and arcs. On these inputs graphs, traditional approaches such as Monte Carlo simulations and algebraic techniques are often impractical -if not entirely useless -since their cost can scale linearly or superlinearly with the size of the graph. An alternative approach is that of local graph algorithms, that, broadly speaking, work by exploring only a small portion of the graph around a given target node. Local algorithms are justified by the fact that, often, one does not need an exact computation of the entire score vector, but only a quick approximation for a few nodes of interest. Obviously, in exchange one hopes to drastically reduce both the running time and the portion of the graph to be fetched. One of the best-known examples is perhaps local graph clustering [4,54,33].In this paper we address the problem of locally approximating the centrality score of a node in a graph, focusing on the PageRank and heat kernel centralities. PageRank [20] is a classic graph centrality measure with a vast number of applications including local graph clustering [4], trendsetter identification [52], spam filtering [40], link prediction [39] and many more (see [35] and [23]); it has been named one of the top 10 algorithms in data mining [55]. Heat kernel [24] can be seen as a variant of PageRank that satisfies the heat equation. Its applications span biological network analysis [31,30] and solving local linear systems...