Abstract. Distinct social networks are interconnected via bridge users, who play thus a key role when crossing information is investigated in the context of Social Internetworking analysis. Unfortunately, not always users make their role of bridge explicit by specifying the so-called me edge (i.e., the edge connecting the accounts of the same user in two distinct social networks), missing thus a potentially very useful information. As a consequence, discovering missing me edges is an important problem to face in this context yet not so far investigated. In this paper, we propose a common-neighbors approach to detecting missing me edges, which returns good results in real life settings. Indeed, an experimental campaign shows both that the state-of-the-art common-neighbors approaches cannot be effectively applied to our problem and, conversely, that our approach returns precise and complete results.
Histograms are used to summarize the contents of relations into a number of buckets for the estimation of query result sizes. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide accurate estimations. However, while search strategies for optimal bucket boundaries are rather sophisticated, no much attention has been paid for estimating queries inside buckets and all of the above techniques adopt naive methods for such an estimation. This paper focuses on the problem of improving the estimation inside a bucket once its boundaries have been fixed. The proposed technique is based on the addition, to each bucket, of 32-bit additional information (organized into a 4-level tree index), storing approximate cumulative frequencies at 7 internal intervals of the bucket. Both theoretical analysis and experimental results show that, among a number of alternative ways to organize the additional information, the 4-level tree index provides the best frequency estimation inside a bucket. The index is later added to two well-known histograms, MaxDiff and V-Optimal, obtaining the non-obvious result that despite the spatial cost of 4LT which reduces the number of allowed buckets once the storage space has been fixed, the original methods are strongly improved in terms of accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.