We examine a very large-scale data set of more than 30 billion call records made by 25 million cell phone users across all 50 states of the US and attempt to determine to what extent anonymized location data can reveal private user information. Our approach is to infer, from the call records, the "top N " locations for each user and correlate this information with publicly-available side information such as census data. For example, the measured "top 2" locations likely correspond to home and work locations, the "top 3" to home, work, and shopping/school/commute path locations. We consider the cases where those "top N " locations are measured with different levels of granularity, ranging from a cell sector to whole cell, zip code, city, county and state. We then compute the anonymity set, namely the number of users uniquely identified by a given set of "top N " locations at different granularity levels.We find that the "top 1" location does not typically yield small anonymity sets. However, the top 2 and top 3 locations do, certainly at the sector or cell-level granularity. We consider a variety of different factors that might impact the size of the anonymity set, for example the distance between the "top N " locations or the geographic environment (rural vs urban). We also examine to what extent specific side information, in particular the size of the user's social network, decrease the anonymity set and therefore increase risks to privacy. Our study shows that sharing anonymized location data will likely lead to privacy risks and that, at a minimum, the data needs to be coarse in either the time domain (meaning the data is collected over short periods of time, in which case inferring the top N locations reliably is difficult) or the space domain (meaning the data granularity is strictly higher than the cell level). In both cases, the utility of the anonymized location data will be decreased, potentially by a significant amount.
DNA is an emerging medium for digital data and its adoption can be accelerated by synthesis processes specialized for storage applications. Here, we describe a de novo enzymatic synthesis strategy designed for data storage which harnesses the template-independent polymerase terminal deoxynucleotidyl transferase (TdT) in kinetically controlled conditions. Information is stored in transitions between non-identical nucleotides of DNA strands. To produce strands representing user-defined content, nucleotide substrates are added iteratively, yielding short homopolymeric extensions whose lengths are controlled by apyrase-mediated substrate degradation. With this scheme, we synthesize DNA strands carrying 144 bits, including addressing, and demonstrate retrieval with streaming nanopore sequencing. We further devise a digital codec to reduce requirements for synthesis accuracy and sequencing coverage, and experimentally show robust data retrieval from imperfectly synthesized strands. This work provides distributive enzymatic synthesis and information-theoretic approaches to advance digital information storage in DNA.
Abstract-End-to-end congestion control mechanisms such as those in TCP are not enough to prevent congestion collapse in the Internet (for starters, not all applications might be willing to use them), and they must be supplemented by control mechanisms inside the network. The IRTF has singled out Random Early Detection (RED) as one queue management scheme recommended for rapid deployment throughout the Internet. However, RED is not a thoroughly understood scheme -witness for example how the recommended parameter settings, or even the various benefits RED is claimed to provide, have changed over the past few years.In this paper, we describe simple analytic models for RED, and use these models to quantify the benefits (or lack thereof) brought about by RED. In particular, we examine the impact of RED on the loss and delay suffered by bursty and less bursty traffic (such as TCP and UDP traffic, respectively). We find that (i) RED does eliminate the higher loss bias against bursty traffic observed with Tail Drop, but not by decreasing the loss rate of bursty traffic, rather by increasing that of non bursty traffic; (ii) the number of consecutive packet drops is higher with RED than Tail Drop, suggesting RED might not help as anticipated with the global synchronization of TCP flows; (iii) RED can be used to control the average queueing delay in routers and hence the end to end delay, but increases the jitter of non bursty streams. Thus, applications that generate smooth traffic, such as interactive audio applications, will suffer higher loss rates and require large playout buffers, thereby negating at least in part the lower mean delay brought about by RED.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.