A Concentration of Measure Approach to Database De-anonymization

Shirani, Farhad; Garg, Siddharth; Erkip, Elza

doi:10.1109/isit.2019.8849392

Cited by 21 publications

(32 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…statistics, we can show users have no privacy iff m = Ω(n 2 r−1 +α ) and a n = O(n − 1 r−1 −β ); however, if the data trace of users is governed by a Markov chain, we can show users have no privacy iff m = Ω(n 2 |E |−r +α ) and a n = O(n − 1 |E |−r −β ). Most of the previous work [19]- [25] that considers intra-user dependency assumes independence between the traces of different users, which is different from our work as described below.…”

Section: Also Definementioning

confidence: 83%

“…The bulk of previous work assumes independence between the traces of different users. [19]- [25] have mostly considered temporal and spatial dependency within data traces, but not crossuser dependency. In [19], an obfuscation technique is employed to achieve privacy; however, for continuous Location-Based Services (LBS) queries, there is often strong temporal dependency in the locations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Privacy of Dependent Users Against Statistical Matching

Takbiri¹,

Houmansadr

Goeckel³

et al. 2020

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

Modern applications significantly enhance user experience by adapting to each user's individual condition and/or preferences. While this adaptation can greatly improve a user's experience or be essential for the application to work, the exposure of user data to the application presents a significant privacy threat to the users-even when the traces are anonymized-since the statistical matching of an anonymized trace to prior user behavior can identify a user and their habits. Because of the current and growing algorithmic and computational capabilities of adversaries, provable privacy guarantees as a function of the degree of anonymization and obfuscation of the traces are necessary. Our previous work has established the requirements on anonymization and obfuscation in the case that data traces are independent between users. However, the data traces of different users will be dependent in many applications, and an adversary can potentially exploit such. In this paper, we consider the impact of dependency between user traces on their privacy. First, we demonstrate that the adversary can readily identify the association graph of the obfuscated and anonymized version of the data, revealing which user data traces are dependent. Next, we demonstrate that the adversary can use this association graph to break user privacy with significantly shorter traces than in the case of independent users, and that obfuscating data traces independently across users is often insufficient to remedy such leakage. Finally, we discuss how users can improve privacy by employing joint obfuscation that removes or reduces the data dependency.

show abstract

Section: Also Definementioning

confidence: 83%

Section: Introductionmentioning

confidence: 99%

Privacy of Dependent Users Against Statistical Matching

Takbiri¹,

Houmansadr

Goeckel³

et al. 2020

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

show abstract

“…More recently matching of correlated databases have been rigorously investigated in [6] and [7]. In [6], Shirani et al developed a matching scheme based on joint typicality and derived necessary and sufficient conditions on the database growth rate for realiable matching using an extension of Shannon-McMillan-Breiman Theorem and Fano's inequality. In [7], Cullina et…”

Section: Introductionmentioning

confidence: 99%

“…We model the above example as a database matching problem where the goal is to match the corresponding rows across databases such that the probability of mismatch goes to zero as the number of attributes in the database (number of columns) grows to infinity. The two databases are assumed to have the same number of users (rows) and are generated according to a bivariate stochastic process as in [6]. Different than [6], the second database suffers from column deletion.…”

Section: Al Introduced Cycle Mutual Information As Amentioning

confidence: 99%

“…The two databases are assumed to have the same number of users (rows) and are generated according to a bivariate stochastic process as in [6]. Different than [6], the second database suffers from column deletion. The indices of the deleted columns are not known due to synchronization errors similar to the deletion channel model [8].…”

Section: Al Introduced Cycle Mutual Information As Amentioning

confidence: 99%

See 1 more Smart Citation

Database Matching Under Column Deletions

Bakirtas¹,

Erkip²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

De-anonymizing user identities by matching various forms of user data available on the internet raises privacy concerns. A fundamental understanding of the privacy leakage in such scenarios requires a careful study of conditions under which correlated databases can be matched. Motivated by synchronization errors in time indexed databases, in this work, matching of random databases under random column deletion is investigated. Adapting tools from information theory, in particular ones developed for the deletion channel, conditions for database matching in the absence and presence of deletion location information are derived, showing that partial deletion information significantly increases the achievable database growth rate for successful matching. Furthermore, given a batch of correctly-matched rows, a deletion detection algorithm that provides partial deletion information is proposed and a lower bound on the algorithm's deletion detection probability in terms of the column size and the batch size is derived. The relationship between the database size and the batch size required to guarantee a given deletion detection probability using the proposed algorithm suggests that a batch size growing double-logarithmic with the row size is sufficient for a nonzero detection probability guarantee.

show abstract

Matching of Markov Databases Under Random Column Repetitions

Bakirtas¹,

Erkip²

2022

2022 56th Asilomar Conference on Signals, Systems, and Computers

View full text Add to dashboard Cite

A Concentration of Measure Approach to Database De-anonymization

Cited by 21 publications

References 17 publications

Privacy of Dependent Users Against Statistical Matching

Privacy of Dependent Users Against Statistical Matching

Database Matching Under Column Deletions

Matching of Markov Databases Under Random Column Repetitions

Contact Info

Product

Resources

About