Extensive research has been conducted on top of online social networks (OSNs), while little attention has been paid to the data collection process. Due to the large scale of OSNs and their privacy control policies, a partial data set is often used for analysis. The data set analyzed is decided by many factors including the choice of seeds, node selection algorithms, and the sample size. These factors may introduce biases and further contaminate or even skew the results. To evaluate the impact of different factors, this paper examines the OSN graph crawling problem, where the nodes are OSN users and the edges are the links (or relationship) among these users. More specifically, by looking at various factors in the crawling process, the following problems are addressed in this paper:• Efficiency: How fast different crawlers discover nodes/links; • Sensitivity: How different OSNs and the number of protected users affect crawlers; • Bias: How major graph properties are skewed. To the best of our knowledge, our simulations on four real world online social graphs provide the first in-depth empirical answers to these questions. 1
To protect user privacy in the search engine context, most current approaches, such as private information retrieval and privacy preserving data mining, require a server-side deployment, thus users have little control over their data and privacy. In this paper we propose a user-side solution within the context of keyword based search. We model the search privacy threat as an information inference problem and show how to inject noise into user queries to minimize privacy breaches. The search privacy breach is measured as the mutual information between real user queries and the diluted queries seen by search engines. We give the lower bound for the amount of noise queries required by a perfect privacy protection and provide the optimal protection given the number of noise queries. We verify our results with a special case where the number of noise queries is equal to the number of user queries. The simulation result shows that the noise given by our approach greatly reduces privacy breaches and outperforms random noise. As far as we know, this work presents the first theoretical analysis on user side noise injection for search privacy protection.
Abstract:In the actual analysis of grey clustering evaluation, the length of a grey clustering interval was partially longer, which is determined by the grey clustering evaluation method based on the center-point triangular whitenization weight function. In response to problems like this, this paper proposes a new grey evaluation method on the basis of the reformative triangular whitenization weight function. Motivated by ideas of the end-point and the center-point triangular whitenization weight functions, we construct a new compact-center-point triangular whitenization weight function. Then, several aspects of the three kinds of triangular whitenization weight functions are compared, such as the crossing properties of grey cluster, clustering coefficients, rules for grey clustering interval, rules for choosing end-points and clustering performance. In the following, this paper proposes an example about the evaluation of a basin initial water rights allocation scheme, which analyzes the three methods to further verify that the new grey clustering evaluation method is feasible and effective. The results indicate that the compact-center-point triangular whitenization weight function precedes the end-point triangular whitenization weight function and the center-point triangular whitenization weight function soundly.
The huge size of online social networks (OSNs) makes it prohibitively expensive to precisely measure any properties which require the knowledge of the entire graph. To estimate the size of an OSN, i.e., the number of users an OSN has, this paper introduces two estimators using widely available OSN functionalities/services. The first estimator is a maximum likelihood estimator (MLE) based on uniform sampling. An O(logn) algorithm is developed to solve the estimator, which is 70 times faster than the naive linear probing algorithm in our experiments. The second estimator is based on random walkers and we generalize it to estimate other graph properties. In-depth evaluations are conducted on six real OSNs to show the bias and variance of these two estimators. Our analysis addresses the challenges and pitfalls when developing and implementing such estimators for OSNs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.