In this paper we investigate a new idea for traffic matrix estimation that makes the basic problem less under-constrained, by deliberately changing the routing to obtain additional measurements. Because all these measurements are collected over disparate time intervals, we need to establish models for each Origin-Destination (OD) pair to capture the complex behaviours of internet traffic. We model each OD pair with two components: the diurnal pattern and the fluctuation process. We provide models that incorporate the two components above, to estimate both the first and second order moments of traffic matrices. We do this for both stationary and cyclo-stationary traffic scenarios. We formalize the problem of estimating the second order moment in a way that is completely independent from the first order moment. Moreover, we can estimate the second order moment without needing any routing changes (i.e., without explicit changes to IGP link weights). We prove for the first time, that such a result holds for any realistic topology under the assumption of minimum cost routing and strictly positive link weights . We highlight how the second order moment helps the identification of the top largest OD flows carrying the most significant fraction of network traffic. We then propose a refined methodology consisting of using our variance estimator (without routing changes) to identify the top largest flows, and estimate only these flows. The benefit of this method is that it dramatically reduces the number of routing changes needed. We validate the effectiveness of our methodology and the intuitions behind it by using real aggregated sampled netflow data collected from a commercial Tier-1 backbone.
In order to control and manage highly aggregated Internet traffic flows efficiently, we need to be able to categorize flows into distinct classes and to be knowledgeable about the different behavior of flows belonging to these classes. In this paper we consider the problem of classifying BGP level prefix flows into a small set of homogeneous classes. We argue that using the entire distributional properties of flows can have significant benefits in terms of quality in the derived classification. We propose a method based on modeling flow histograms using Dirichlet Mixture Processes for random distributions. We present an inference procedure based on the Simulated Annealing Expectation Maximization algorithm that estimates all the model parameters as well as flow membership probabilities - the probability that a flow belongs to any given class. One of our key contributions is a new method for Internet flow classification. We show that our method is powerful in that it is capable of examining macroscopic flows while simultaneously making fine distinctions between different traffic classes. We demonstrate that our scheme can address issues with flows being close to class boundaries and the inherent dynamic behaviour of Internet flows.
Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.