In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius R in d dimensional space equipped with lp norm when p ∈ [1, ∞]. Furthermore, we show how to use these hash functions to solve the c-approximate nearest neighbor search problem without false negatives. Namely, if there is a point at distance R, we will certainly report it and points at distance greater than cR will not be reported for c = Ω( √ d, d1− 1 p ). The constructed algorithms work:• with preprocessing time O(n log(n)) and sublinear expected query time,• with preprocessing time O(poly(n)) and expected query time O(log(n)).Our paper reports progress on answering the open problem presented by Pagh [8], who considered the nearest neighbor search without false negatives for the Hamming distance.
We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is consistent with multiple empirical analysis of the large social networks. We compared the assumptions of our model with the Twitter social network and tested the goodness of approximation.
How information spreads through a social network? Can we assume, that the information is spread only through a given social network graph? What is the correct way to compare the models of information flow? These are the basic questions we address in this work. We focus on meticulous comparison of various, well-known models of rumor propagation in the social network. We introduce the model incorporating mass media and effects of absent nodes. In this model the information appears spontaneously in the graph. Using the most conservative metric, we showed that the distribution of cascades sizes generated by this model fits the real data much better than the previously considered models.Comment: 8 pages, 2 figures, Hypertext 201
No abstract
This paper introduces the concept of traffic-fingerprints, i.e., normalized 24-dimensional vectors representing a distribution of daily traffic on a web page. Using k-means clustering we show that similarity of traffic-fingerprints is related to the similarity of profitability time patterns for ads shown on these pages. In other words, these fingerprints are correlated with the conversions rates, thus allowing us to argue about conversion rates on pages with negligible traffic. By blocking or unblocking whole clusters of pages we were able to increase the revenue of online campaigns by more than 50%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.