Recent advances in optimization techniques for statistical tabular data protection

Castro, Jordi

doi:10.1016/j.ejor.2011.03.050

Cited by 25 publications

(38 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…since one seeks for the released values x that are closest (in the given norm) to the true values a, compatible with the relationships that a is known to have to satisfy, and protected according to (6). Of course, the disjunctive constraints (6) are the difficult part of the problem, their feasible region being nonconvex.…”

Section: Formulations Of the Cta Problemmentioning

confidence: 99%

“…This justifies the interest in statistical disclosure control, i.e., the set of techniques that can be deployed to protect sensitive information. In particular, the focus of this work is on tabular data protection; seminal work on this field can be found in [2], and the current state-of-the-art is described in the recent surveys of [25] and [6], as well as in the monographs [27,22]. Although tabular data provide aggregated information, the publication of some cells may jeopardize individual information.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Perspective Reformulations of the CTA Problem with L₂ Distances

Castro

Frangioni

Gentile

2014

Operations Research

Self Cite

View full text Add to dashboard Cite

Any institution that disseminates data in aggregated form has the duty to ensure that individual confidential information is not disclosed, either by not releasing data or by perturbing the released data, while maintaining data utility. Controlled tabular adjustment (CTA) is a promising technique of the second type where a protected table that is close to the original one in some chosen distance is constructed. We attempt, for the first time, to solve CTA with Euclidean distances; this gives rise to difficult Mixed Integer Quadratic Problems (MIQPs) with pairs of linked semi-continuous variables. We provide a novel analysis of Perspective Reformulations (PRs) for this special structure; in particular, we devise a Projected PR (P 2 R) which is piecewise-conic but simplifies to a (nonseparable) MIQP when the instance is symmetric. We then compare different formulations of the CTA problem, showing that the ones based on P 2 R most often obtain better computational results.

show abstract

Section: Formulations Of the Cta Problemmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Perspective Reformulations of the CTA Problem with L₂ Distances

Castro

Frangioni

Gentile

2014

Operations Research

Self Cite

View full text Add to dashboard Cite

show abstract

“…For each cell, the table may report either the number of individuals (frequency tables) or information about another variable (magnitude tables). More details can be found in the recent survey [5] and the monographs [26,27]. Although cell tables report aggregated information for several respondents-so they could be considered anonymized-there is a risk of disclosing individual data.…”

Section: Introductionmentioning

confidence: 99%

“…These tables are obtained by crossing a particular categorical variable with a set of, say, h categorical variables that have a hierarchical relation; this results in a set of h two-dimensional tables with some common cells. For instance, Figure 2 (from [5]) illustrates a particular 1H2D table. The left subtable shows number of respondents for "region"×"profession"; the middle subtable is a "zoom in" of region R 2 , providing the number of respondents in municipalities of this region; finally the right subtable details the ZIP codes of municipality R 21 .…”

Section: Introductionmentioning

confidence: 99%

“…Several approaches have been tried to speed up the solution time. A straightforward Benders reformulation of the problem was attempted in [7], but promising results were only obtained for two-dimensional tables (i.e., tables obtained by crossing two categorical variables, whose constraints are represented by a node-arc network incidence matrix [5]). Heuristic and metaheuristic methods were attempted in [22], but they only solved small two-dimensional and three-dimensional tables of up to 625 and 8000 cells, respectively, while we consider in this work much more complex synthetic and real tables from the literature, of up to 200000 and 36000 cells, respectively.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Fix-and-relax approaches for controlled tabular adjustment

Baena

Castro

González

2015

Computers & Operations Research

Self Cite

View full text Add to dashboard Cite

Controlled tabular adjustment (CTA) is a relatively new protection technique for tabular data protection. CTA formulates a mixed integer linear programming problem, which is challenging for tables of moderate size. Even finding a feasible initial solution may be a challenging task for large instances. On the other hand, end users of tabular data protection techniques give priority to fast executions and are thus satisfied in practice with suboptimal solutions. This work has two goals. First, the fix-and-relax (FR) strategy is applied to obtain good feasible initial solutions to large CTA instances. FR is based on partitioning the set of binary variables into clusters to selectively explore a smaller branch-and-cut tree. Secondly, the FR solution is used as a warm start for a block coordinate descent (BCD) heuristic (approach named FR+BCD); BCD was confirmed to be a good option for large CTA instances in an earlier paper by the second and third co-authors (Computers & Operations Research 2011). We report extensive computational results on a set of real-world and synthetic CTA instances. FR is shown to be competitive compared to CPLEX branch-and-cut in terms of quickly finding either a feasible solution or a good upper bound. FR+BCD improved the quality of FR solutions for approximately 25% and 50% of the synthetic and real-world instances, respectively. FR or FR+BCD provided similar or better solutions in less CPU time than CPLEX for 73% of the difficult real-world instances.

show abstract

References

2012

Statistical Disclosure Control

View full text Add to dashboard Cite

Recent advances in optimization techniques for statistical tabular data protection

Cited by 25 publications

References 33 publications

Perspective Reformulations of the CTA Problem with L₂ Distances

Perspective Reformulations of the CTA Problem with L₂ Distances

Fix-and-relax approaches for controlled tabular adjustment

References

Contact Info

Product

Resources

About

Recent advances in optimization techniques for statistical tabular data protection

Cited by 25 publications

References 33 publications

Perspective Reformulations of the CTA Problem with L2 Distances

Perspective Reformulations of the CTA Problem with L2 Distances

Fix-and-relax approaches for controlled tabular adjustment

References

Contact Info

Product

Resources

About

Perspective Reformulations of the CTA Problem with L₂ Distances

Perspective Reformulations of the CTA Problem with L₂ Distances