C‐sanitized: A privacy model for document redaction and sanitization

Sánchez, David; Batet, Montserrat

doi:10.1002/asi.23363

Cited by 67 publications

(96 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As noted in works such as (Anandan & Clifton, 2011;Sánchez, Batet, & Viejo, 2014b), semantic correlations between textual terms can cause additional disclosure risks (e.g., treatments or drugs closely related to a sensitive disease can disclose the latter). Additional semantic analyses are required to deal with this issue (Sánchez & Batet, 2015;Sánchez, et al, 2014b).…”

Section: Accepted Manuscriptmentioning

confidence: 99%

REMOVED: Enforcing transparent access to private content in social networks by means of automatic sanitization

Viejo

Sánchez

2015

Expert Systems with Applications

Self Cite

View full text Add to dashboard Cite

Section: Accepted Manuscriptmentioning

confidence: 99%

REMOVED: Enforcing transparent access to private content in social networks by means of automatic sanitization

Viejo

Sánchez

2015

Expert Systems with Applications

Self Cite

View full text Add to dashboard Cite

“…Similarly, for plain text data, approaches such as [16,17] address (i) by using models to 'recognize several classes of PII' such as names and credit cards, while [47] focuses on (ii) that is, sanitizing an entity c by removing all terms t that can identify c individually or in aggregate in a knowledge base K. Indeed, any privacy preserving algorithm that places a priori classification on sensitive data types assume boundaries on an attackers side knowledge and a finite limit on potentially new classes of personal identifiers. Our approach with d χ -privacy aims to do away with such assumptions to provide tunable privacy guarantees.…”

Section: Related Workmentioning

confidence: 99%

Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations

Feyisetan

Balle

Drake

et al. 2020

Proceedings of the 13th International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

Accurately learning from user data while providing quantifiable privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of d χ -privacy designed to achieve geoindistinguishability in location data. Our approach applies carefully calibrated noise to vector representation of words in a high dimension space as defined by word embedding models. We present a privacy proof that satisfies d χ -privacy where the privacy parameter ε provides guarantees with respect to a distance metric defined by the word embedding space. We demonstrate how ε can be selected by analyzing plausible deniability statistics backed up by large scale analysis on G V and T embeddings. We conduct privacy audit experiments against 2 baseline models and utility experiments on 3 datasets to demonstrate the tradeoff between privacy and utility for varying values of ε on different task types. Our results demonstrate practical utility (< 2% utility loss for training binary classifiers) while providing better privacy guarantees than baseline models.

show abstract

“…We believe that the information science community is particularly well positioned to contribute to the current privacy discussion and to shape the solution space with innovative ideas. Indeed, a quick survey of JASIST publications during the past decade (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) shows that more than 30 articles have tackled privacy issues in various empirical contexts, including mobile health (Clarke & Steele, 2015;Harvey & Harvey, 2014), social media platforms (Squicciarini, Xu, & Zhang, 2011;Stern & Kumar, 2014), as well as new ways to model and measure privacy in academic research (Rubel & Biava, 2014;Sánchez & Batet, 2016). Collectively, these studies span a broad spectrum of intellectual traditions in the community and demonstrate nuanced understandings of the relationship between ICTs and privacy.…”

Section: Introductionmentioning

confidence: 99%

A contextual approach to information privacy research

Vitak

Zimmer

2019

Asso for Info Science & Tech

View full text Add to dashboard Cite

In this position article, we synthesize various knowledge gaps in information privacy scholarship and propose a research agenda that promotes greater cross‐disciplinary collaboration within the iSchool community and beyond. We start by critically examining Westin's conceptualization of information privacy and argue for a contextual approach that holds promise for overcoming some of Westin's weaknesses. We then highlight three contextual considerations for studying privacy—digital networks, marginalized populations, and the global context—and close by discussing how these considerations advance privacy theorization and technology design.

show abstract

C‐sanitized: A privacy model for document redaction and sanitization

Cited by 67 publications

References 34 publications

REMOVED: Enforcing transparent access to private content in social networks by means of automatic sanitization

REMOVED: Enforcing transparent access to private content in social networks by means of automatic sanitization

Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations

A contextual approach to information privacy research

Contact Info

Product

Resources

About