We present new algorithms for Personalized PageRank estimation and Personalized PageRank search. First, for the problem of estimating Personalized PageRank (PPR) from a source distribution to a target node, we present a new bidirectional estimator with simple yet strong guarantees on correctness and performance, and 3x to 8x speedup over existing estimators in experiments on a diverse set of networks. Moreover, it has a clean algebraic structure which enables it to be used as a primitive for the Personalized PageRank Search problem: Given a network like Facebook, a query like "people named John," and a searching user, return the top nodes in the network ranked by PPR from the perspective of the searching user. Previous solutions either score all nodes or score candidate nodes one at a time, which is prohibitively slow for large candidate sets. We develop a new algorithm based on our bidirectional PPR estimator which identifies the most relevant results by sampling candidates based on their PPR; this is the first solution to PPR search that can find the best results without iterating through the set of all candidate results. Finally, by combining PPR sampling with sequential PPR estimation and Monte Carlo, we develop practical algorithms for PPR search, and we show via experiments that our algorithms are efficient on networks with billions of edges.
Online cancer communities help members support one another, provide new perspectives about living with cancer, normalize experiences, and reduce isolation. The American Cancer Society's 166000-member Cancer Survivors Network (CSN) is the largest online peer support community for cancer patients, survivors, and caregivers. Sentiment analysis and topic modeling were applied to CSN breast and colorectal cancer discussion posts from 2005 to 2010 to examine how sentiment change of thread initiators, a measure of social support, varies by discussion topic. The support provided in CSN is highest for medical, lifestyle, and treatment issues. Threads related to 1) treatments and side effects, surgery, mastectomy and reconstruction, and decision making for breast cancer, 2) lung scans, and 3) treatment drugs in colon cancer initiate with high negative sentiment and produce high average sentiment change. Using text mining tools to assess sentiment, sentiment change, and thread topics provides new insights that community managers can use to facilitate member interactions and enhance support outcomes.
Markov Random Fields (MRFs), a.k.a. Graphical Models, serve as popular models for networks in the social and biological sciences, as well as communications and signal processing. A central problem is one of structure learning or model selection: given samples from the MRF, determine the graph structure of the underlying distribution. When the MRF is not Gaussian (e.g. the Ising model) and contains cycles, structure learning is known to be NP hard even with infinite samples. Existing approaches typically focus either on specific parametric classes of models, or on the sub-class of graphs with bounded degree; the complexity of many of these methods grows quickly in the degree bound. We develop a simple new 'greedy' algorithm for learning the structure of graphical models of discrete random variables. It learns the Markov neighborhood of a node by sequentially adding to it the node that produces the highest reduction in conditional entropy. We provide a general sufficient condition for exact structure recovery (under conditions on the degree/girth/correlation decay), and study its sample and computational complexity. We then consider its implications for the Ising model, for which we establish a self-contained condition for exact structure recovery.
Optimizing shared vehicle systems (bike-sharing/car-sharing/ride-sharing) is more challenging compared to traditional resource allocation settings due to the presence of complex network externalities -changes in the demand/supply at any location affect future supply throughout the system within short timescales. These externalities are well captured by steady-state Markovian models, which are therefore widely used to analyze such systems. However, using such models to design pricing/control policies is computationally difficult since the resulting optimization problems are high-dimensional and non-convex.To this end, we develop a general approximation framework for designing pricing policies in shared vehicle systems, based on a novel convex relaxation which we term elevated flow relaxation. Our approach provides the first efficient algorithms with rigorous approximation guarantees for a wide range of objective functions (throughput, revenue, welfare). For any shared vehicle system with n stations and m vehicles, our framework provides a pricing policy with an approximation ratio of 1 + (n − 1)/m. This guarantee is particularly meaningful when m/n, the average number of vehicles per station is large, as is often the case in practice.Further, the simplicity of our approach allows us to extend it to more complex settings: rebalancing empty vehicles, redirecting riders to nearby vehicles, multi-objective settings (such as Ramsey pricing), incorporating travel-times, etc. Our approach yields efficient algorithms with the same approximation guarantees for all these problems, and in the process, obtains as special cases several existing heuristics and asymptotic guarantees.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.