2022
DOI: 10.1162/coli_a_00458
|View full text |Cite
|
Sign up to set email alerts
|

The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization

Abstract: We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods. Text anonymization, defined as the task of editing a text document to prevent the disclosure of personal information, currently suffers from a shortage of privacy-oriented annotated text resources, making it difficult to properly evaluate the level of privacy protection offered by various anonymization methods. This paper presents TAB (Text Anonymization Benchmark), a new, open-source ann… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 69 publications
0
1
0
Order By: Relevance
“…In O'Shaughnessy and Lin's study [3], they propose research on privacy protection practice for data mining with multiple data sources, putting forward a framework for reconstructing data and putting it to use in data clustering projects. Leveraging the IoT, information collection can be conducted at any time, from anywhere, using various devices, and through any transmission channel, which shows the necessity of privacy-preservation [6].…”
Section: Applications Of Privacy-preserving Multilingual Comparable C...mentioning
confidence: 99%
See 3 more Smart Citations
“…In O'Shaughnessy and Lin's study [3], they propose research on privacy protection practice for data mining with multiple data sources, putting forward a framework for reconstructing data and putting it to use in data clustering projects. Leveraging the IoT, information collection can be conducted at any time, from anywhere, using various devices, and through any transmission channel, which shows the necessity of privacy-preservation [6].…”
Section: Applications Of Privacy-preserving Multilingual Comparable C...mentioning
confidence: 99%
“…Table 2 shows the existing research on applications of multilingual comparable corpora in the IoT, including the privacy-preserving aspects in the above review. Paper Summary [3,6,22] Privacy protection practice for data mining and construction corpus for multiple data clustering; multilingual comparable corpus used in language teaching; evaluation of privacy-oriented corpus by use of text anonymization. [4,8,30,36] Privacy protection for medical data, industry data and railway data, less coverage of multilingual comparable data in IoT.…”
Section: Applications Of Privacy-preserving Multilingual Comparable C...mentioning
confidence: 99%
See 2 more Smart Citations
“…In real-world scenarios, obtaining certain data types such as documents containing personal information (e.g., identification cards, medical receipts, and prescription receipts), voice phishing-related audio data for detecting phishing attempts, and documents with substantial defense information can be challenging [17,18,19]. Consequently, researchers frequently rely on synthetic data to generate these datasets.…”
Section: Usermentioning
confidence: 99%