2015
DOI: 10.1002/asi.23363
|View full text |Cite
|
Sign up to set email alerts
|

C‐sanitized: A privacy model for document redaction and sanitization

Abstract: Within the current context of Information Societies, large amounts of information are daily exchanged and/or released. The sensitive nature of much of this information causes a serious privacy threat when documents are uncontrollably made available to untrusted third parties. In such cases, appropriate data protection measures should be undertaken by the responsible organization, especially under the umbrella of current legislations on data privacy. To do so, human experts are usually requested to redact or sa… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
89
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 67 publications
(96 citation statements)
references
References 34 publications
0
89
0
Order By: Relevance
“…As noted in works such as (Anandan & Clifton, 2011;Sánchez, Batet, & Viejo, 2014b), semantic correlations between textual terms can cause additional disclosure risks (e.g., treatments or drugs closely related to a sensitive disease can disclose the latter). Additional semantic analyses are required to deal with this issue (Sánchez & Batet, 2015;Sánchez, et al, 2014b).…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…As noted in works such as (Anandan & Clifton, 2011;Sánchez, Batet, & Viejo, 2014b), semantic correlations between textual terms can cause additional disclosure risks (e.g., treatments or drugs closely related to a sensitive disease can disclose the latter). Additional semantic analyses are required to deal with this issue (Sánchez & Batet, 2015;Sánchez, et al, 2014b).…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…Similarly, for plain text data, approaches such as [16,17] address (i) by using models to 'recognize several classes of PII' such as names and credit cards, while [47] focuses on (ii) that is, sanitizing an entity c by removing all terms t that can identify c individually or in aggregate in a knowledge base K. Indeed, any privacy preserving algorithm that places a priori classification on sensitive data types assume boundaries on an attackers side knowledge and a finite limit on potentially new classes of personal identifiers. Our approach with d χ -privacy aims to do away with such assumptions to provide tunable privacy guarantees.…”
Section: Related Workmentioning
confidence: 99%
“…We believe that the information science community is particularly well positioned to contribute to the current privacy discussion and to shape the solution space with innovative ideas. Indeed, a quick survey of JASIST publications during the past decade (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) shows that more than 30 articles have tackled privacy issues in various empirical contexts, including mobile health (Clarke & Steele, 2015;Harvey & Harvey, 2014), social media platforms (Squicciarini, Xu, & Zhang, 2011;Stern & Kumar, 2014), as well as new ways to model and measure privacy in academic research (Rubel & Biava, 2014;Sánchez & Batet, 2016). Collectively, these studies span a broad spectrum of intellectual traditions in the community and demonstrate nuanced understandings of the relationship between ICTs and privacy.…”
Section: Introductionmentioning
confidence: 99%