Diffusion is a fundamental graph process, underpinning such phenomena as epidemic disease contagion and the spread of innovation by word-of-mouth. We address the algorithmic problem of finding a set of k initial seed nodes in a network so that the expected size of the resulting cascade is maximized, under the standard independent cascade model of network diffusion. Runtime is a primary consideration for this problem due to the massive size of the relevant input networks.We provide a fast algorithm for the influence maximization problem, obtaining the nearoptimal approximation factor of (1 − 1 e − ǫ), for any ǫ > 0, in time O((m + n)kǫ −2 log n). Our algorithm is runtime-optimal (up to a logarithmic factor) with respect to network size, and substantially improves upon the previously best-known algorithms which run in time Ω(mnk · POLY(ǫ −1 )). Furthermore, our algorithm can be modified to allow early termination: if it is terminated after O(β(m + n)k log n) steps for some β < 1 (which can depend on n), then it returns a solution with approximation factor O(β). Finally, we show that this runtime is optimal (up to logarithmic factors) for any β and fixed seed size k.
We study the power of local information algorithms for optimization problems on social and technological networks. We focus on sequential algorithms where the network topology is initially unknown and is revealed only within a local neighborhood of vertices that have been irrevocably added to the output set. This framework models the behavior of an external agent that does not have direct access to the network data, such as a user interacting with an online social network.We study a range of problems under this model of algorithms with local information. When the underlying graph is a preferential attachment network, we show that one can find the root (i.e. initial node) in a polylogarithmic number of steps, using a local algorithm that repeatedly queries the visible node of maximum degree. This addresses an open question of Bollobás and Riordan. This result is motivated by its implications: we obtain polylogarithmic approximations to problems such as finding the smallest subgraph that connects a subset of nodes, finding the highest-degree nodes, and finding a subgraph that maximizes vertex coverage per subgraph size.Motivated by problems faced by recruiters in online networks, we also consider network coverage problems on arbitrary graphs. We demonstrate a sharp threshold on the level of visibility required: at a certain visibility level it is possible to design algorithms that nearly match the best approximation possible even with full access to the graph structure, but with any less information it is impossible to achieve a non-trivial approximation. We conclude that a network provider's decision of how much structure to make visible to its users can have a significant effect on a user's ability to interact strategically with the network.
A fundamental problem arising in many applications in Web science and social network analysis is the problem of identifying all nodes in a network whose PageRank exceeds a given threshold ∆. In this paper, we study the probabilistic version of the problem where given an arbitrary approximation factor c > 1, we are asked to output a set S of nodes such that with high probability, S contains all nodes of PageRank at least ∆, and no node of PageRank smaller than ∆/c. We call this problem SignificantPageRanks.We develop a nearly optimal, local algorithm for the problem with runtime complexitỹ O(n/∆) on networks with n nodes, where the tilde hides a polylogarithmic factor. We show that any algorithm for solving this problem must have runtime of Ω(n/∆), rendering our algorithm optimal up to logarithmic factors. Our algorithm has sublinear time complexity for applications including Web crawling and Web search that require efficient identification of nodes whose PageRanks are above a threshold ∆ = n δ , for some constant 0 < δ < 1. Our algorithm comes with two main technical contributions. The first is a multi-scale sampling scheme for a basic matrix problem that could be of interest on its own. For us, it appears as an abstraction of a subproblem we need to tackle in order to solve the SignificantPageRanks problem, but we hope that this abstraction will be useful in designing fast algorithms for identifying nodes that are significant beyond PageRank measurements.In the abstract matrix problem it is assumed that one can access an unknown right-stochastic matrix by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter ǫ. At a cost propositional to 1/ǫ, the query will return a list of O(1/ǫ) entries and their indices that provide an ǫ-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least ∆, and omits any column whose sum is less than ∆/c. Our multi-scale sampling scheme solves this problem with costÕ(n/∆), while traditional sampling algorithms would take time Θ((n/∆) 2 ). Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in [2,11] and is highly efficient particularly for networks with large in-degrees or out-degrees.Together with our multiscale sampling scheme we are able to optimally solve the SignificantPageRanks problem.
Social and other networks have been shown empirically to exhibit high edge clustering-that is, the density of local neighborhoods, as measured by the clustering coefficient, is often much larger than the overall edge density of the network. In social networks, a desire for tightknit circles of friendships -the colloquial "social clique" -is often cited as the primary driver of such structure. We introduce and analyze a new network formation game in which rational players must balance edge purchases with a desire to maximize their own clustering coefficient. Our results include the following:-Construction of a number of specific families of equilibrium networks, including ones showing that equilibria can have rather general binary tree-like structure, including highly asymmetric binary trees. This is in contrast to other network formation games that yield only symmetric equilibrium networks. Our equilibria also include ones with large or small diameter, and ones with wide variance of degrees.-A general characterization of (non-degenerate) equilibrium networks, showing that such networks are always sparse and paid for by lowdegree vertices, whereas high-degree "free riders" always have low utility.-A proof that for edge cost a ≥ 1/2 the Price of Anarchy grows linearly with the population size n while for edge cost less than 1/2, the Price of Anarchy of the formation game is bounded by a constant depending only on , and independent of n. Moreover, an explicit upper bound is constructed when the edge cost is a "simple" rational (small numerator) less than 1/2.-A proof that for edge cost less than 1=2 the average vertex clustering coefficient grows at least as fast as a function depending only on , while the overall edge density goes to zero at a rate inversely proportional to the number of vertices in the network.-Results establishing the intractability of even weakly approximating best response computations.Several of our results hold even for weaker notions of equilibrium, such as those based on link stability. Abstract. Social and other networks have been shown empirically to exhibit high edge clustering -that is, the density of local neighborhoods, as measured by the clustering coefficient, is often much larger than the overall edge density of the network. In social networks, a desire for tightknit circles of friendships -the colloquial "social clique" -is often cited as the primary driver of such structure. We introduce and analyze a new network formation game in which rational players must balance edge purchases with a desire to maximize their own clustering coefficient. Our results include the following: Disciplines-Construction of a number of specific families of equilibrium networks, including ones showing that equilibria can have rather general binary tree-like structure, including highly asymmetric binary trees. This is in contrast to other network formation games that yield only symmetric equilibrium networks. Our equilibria also include ones with large or small diameter, and ones with wide varia...
Over the last decade we have witnessed the rapid proliferation of online networks and Internet activity. Although such activity is generally considered a blessing, it also brings with it a large increase in risk of computer malware-malignant software that actively spreads from one computer to another. To date, the majority of existing models of malware spread use stochastic behavior, when the set of neighbors infected from the current set of infected nodes is chosen obliviously. In this work, we initiate the study of planned-infection strategies that can decide intelligently which neighbors of infected nodes to infect next in order to maximize their spread, while maintaining a "signature" similar to the oblivious stochastic infection strategy in order not to be discovered. We first establish that computing optimal and near-optimal planned strategies is computationally hard. We then identify necessary and sufficient conditions in terms of network structure and edge infection probabilities such that the planned process can infect polynomially more nodes than the stochastic process while maintaining a similar "signature" as the oblivious stochastic infection strategy. Among our results is a surprising connection between an additional structural quantity of interest in a network, the network toughness, and planned infections. Based on the network toughness, we characterize networks where existence of planned strategies that are pandemic (infect all nodes) is guaranteed, as well as efficiently computable.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.