Abstract. Although extensive studies have been conducted on online social networks (OSNs), it is not clear how to characterize information propagation and social influence, two types of important but not well defined social behavior. This paper presents a measurement study of 58M messages collected from 700K users on Twitter.com, a popular social medium. We analyze the propagation patterns of general messages and show how breaking news (Michael Jackson's death) spread through Twitter. Furthermore, we evaluate different social influences by examining their stabilities, assessments, and correlations. This paper addresses the complications as well as challenges we encounter when measuring message propagation and social influence on OSNs. We believe that our results here provide valuable insights for future OSN research.
Extensive research has been conducted on top of online social networks (OSNs), while little attention has been paid to the data collection process. Due to the large scale of OSNs and their privacy control policies, a partial data set is often used for analysis. The data set analyzed is decided by many factors including the choice of seeds, node selection algorithms, and the sample size. These factors may introduce biases and further contaminate or even skew the results. To evaluate the impact of different factors, this paper examines the OSN graph crawling problem, where the nodes are OSN users and the edges are the links (or relationship) among these users. More specifically, by looking at various factors in the crawling process, the following problems are addressed in this paper:• Efficiency: How fast different crawlers discover nodes/links; • Sensitivity: How different OSNs and the number of protected users affect crawlers; • Bias: How major graph properties are skewed. To the best of our knowledge, our simulations on four real world online social graphs provide the first in-depth empirical answers to these questions. 1
To protect user privacy in the search engine context, most current approaches, such as private information retrieval and privacy preserving data mining, require a server-side deployment, thus users have little control over their data and privacy. In this paper we propose a user-side solution within the context of keyword based search. We model the search privacy threat as an information inference problem and show how to inject noise into user queries to minimize privacy breaches. The search privacy breach is measured as the mutual information between real user queries and the diluted queries seen by search engines. We give the lower bound for the amount of noise queries required by a perfect privacy protection and provide the optimal protection given the number of noise queries. We verify our results with a special case where the number of noise queries is equal to the number of user queries. The simulation result shows that the noise given by our approach greatly reduces privacy breaches and outperforms random noise. As far as we know, this work presents the first theoretical analysis on user side noise injection for search privacy protection.
PageRank has been widely used as a major factor in search engine ranking systems. However, global link graph information is required when computing PageRank, which causes prohibitive communication cost to achieve accurate results in distributed solution. In this paper, we propose a distributed PageRank computation algorithm based on iterative aggregation-disaggregation (IAD) method with Block Jacobi smoothing. The basic idea is divide-and-conquer. We treat each web site as a node to explore the block structure of hyperlinks. Local PageRank is computed by each node itself and then updated with a low communication cost with a coordinator. We prove the global convergence of the Block Jacobi method and then analyze the communication overhead and major advantages of our algorithm. Experiments on three real web graphs show that our method converges 5-7 times faster than the traditional Power method. We believe our work provides an efficient and practical distributed solution for PageRank on large scale Web graphs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.