2016
DOI: 10.1007/978-3-319-45381-1_13
|View full text |Cite
|
Sign up to set email alerts
|

COCOA: A Synthetic Data Generator for Testing Anonymization Techniques

Abstract: Abstract. Conducting extensive testing of anonymization techniques is critical to assess their robustness and identify the scenarios where they are most suitable. However, the access to real microdata is highly restricted and the one that is publicly-available is usually anonymized or aggregated; hence, reducing its value for testing purposes. In this paper, we present a framework (COCOA) for the generation of realistic synthetic microdata that allows to define multi-attribute relationships in order to preserv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
13
1
1

Year Published

2017
2017
2018
2018

Publication Types

Select...
6
2

Relationship

7
1

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 12 publications
(13 reference statements)
1
13
1
1
Order By: Relevance
“…For instance, the work on [6] presents a semiautomated approach to facilitate the grading of programming code structures, while the work on [7] presents an automated approach to protect sensitive information before sharing it with third parties. Similarly, other works have focused on automatically generating testing data [8], evaluating pedagogical e-learning content [9], or constructing generalisation hierarchies to anonymise categorical data [10]. In the particular case of plagiarism detection of source code, several tools have been developed in order to support users in this task [11].…”
Section: Background and Related Worksupporting
confidence: 47%
“…For instance, the work on [6] presents a semiautomated approach to facilitate the grading of programming code structures, while the work on [7] presents an automated approach to protect sensitive information before sharing it with third parties. Similarly, other works have focused on automatically generating testing data [8], evaluating pedagogical e-learning content [9], or constructing generalisation hierarchies to anonymise categorical data [10]. In the particular case of plagiarism detection of source code, several tools have been developed in order to support users in this task [11].…”
Section: Background and Related Worksupporting
confidence: 47%
“…Moreover, the work on [13] introduced an approach to assess the performance of a distributed memory program in a clustered environment. Meanwhile, other works have focused on generating realistic testing data [9], or providing techniques to facilitate the monitoring of performance counters [21]. Finally, other efforts have also centred on reducing the expertise required.…”
Section: Background and Related Workcontrasting
confidence: 41%
“…Similarly, the work on [23] presents a technique to identify the early warning signs that typically precede a relevant performance degradation in a system. Moreover, some other works have centred on generating useful synthetic testing data [10,11], or on providing techniques that can reduce the expertise required in order to efficiently automate the usage of the diagnosis tools in the performance testing domain [20]. In contrast to these works, which aim to improve other facets of performance testing, our solution addresses the particular need of setting a suitable test workload for a particular application; hence, successfully isolating a user from the complexities of determining such workload.…”
Section: Background and Related Workmentioning
confidence: 99%