The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are N P -hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4. However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.
We present a novel method for exactly solving (in fact, counting solutions to) general constraint satisfaction optimization with at most two variables per constraint (e.g. MAX-2-CSP and MIN-2-CSP), which gives the first exponential improvement over the trivial algorithm. More precisely, the runtime bound is a constant factor improvement in the base of the exponent: the algorithm can count the number of optima in MAX-2-SAT and MAX-CUT instances in O(m 3 2 ωn/3) time, where ω < 2.376 is the matrix product exponent over a ring. When constraints have arbitrary weights, there is a (1 +)-approximation with roughly the same runtime, modulo polynomial factors. Our construction shows that improvement in the runtime exponent of either k-clique solution (even when k = 3) or matrix multiplication over GF(2) would improve the runtime exponent for solving 2-CSP optimization. Our approach also yields connections between the complexity of some (polynomial time) high dimensional search problems and some NP-hard problems. For example, if there are sufficiently faster algorithms for computing the diameter of n points in 1, then there is an (2 −) n algorithm for MAX-LIN. These results may be construed as either lower bounds on the high-dimensional problems, or hope that better algorithms exist for the corresponding hard problems.
In low-depth circuit complexity, the polynomial method is a way to prove lower bounds by translating weak circuits into low-degree polynomials, then analyzing properties of these polynomials. Recently, this method found an application to algorithm design: Williams (STOC 2014) used it to compute all-pairs shortest paths in n 3 /2 Ω(√ log n) time on dense n-node graphs. In this paper, we extend this methodology to solve a number of problems in combinatorial pattern matching and Boolean algebra, considerably faster than previously known methods. First, we give an algorithm for BOOLEAN ORTHOGONAL DETECTION, which is to detect among two sets A, B ⊆ {0, 1} d of size n if there is an x ∈ A and y ∈ B such that x, y = 0. For vectors of dimension d = c(n) log n, we solve BOOLEAN ORTHOGONAL DETECTION in n 2−1/O(log c(n)) time by a Monte Carlo randomized algorithm. We apply this as a subroutine in several other new algorithms:
For a pattern graph H on k nodes, we consider the problems of finding and counting the number of (not necessarily induced) copies of H in a given large graph G on n nodes, as well as finding minimum weight copies in both nodeweighted and edge-weighted graphs. Our results include:• The number of copies of an H with an independent set of size s can be computed exactly in O * (2 s n k−s+3 ) time. A minimum weight copy of such an H (with arbitrary real weights on nodes and edges) can be found inThe O * notation omits poly(k) factors.) These algorithms rely on fast algorithms for computing the permanent of a k × n matrix, over rings and semirings.• The number of copies of any H having minimum (or maximum) node-weight (with arbitrary real weights on nodes) can be found in O(n ωk/3 + n 2k/3+o(1) ) time, where ω < 2.4 is the matrix multiplication exponent and k is divisible by 3. Similar results hold for other values of k. Also, the number of copies having exactly a prescribed weight can be found within this time. These algorithms extend the technique of Czumaj and Lingas (SODA 2007) and give a new (algorithmic) application of multiparty communication complexity.• Finding an edge-weighted triangle of weight exactly 0 in general graphs requires Ω(n 2.5−ε ) time for all ε > 0, unless the 3SUM problem on N numbers can be solved in O(N 2−ε ) time. This suggests that the edge-weighted problem is much harder than its node-weighted version.
We show how to compute any symmetric Boolean function on n variables over any field (as well as the integers) with a probabilistic polynomial of degree O( n log(1/ε)) and error at most ε. The degree dependence on n and ε is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY function. The proof is constructive: a low-degree polynomial can be efficiently sampled from the distribution.This polynomial construction is combined with other algebraic ideas to give the first subquadratic time algorithm for computing a (worst-case) batch of Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let c(n) : N → N. Suppose we are given a database D of n vectors in {0, 1} c(n) logn and a collection of n query vectors Q in the same dimension. For all u ∈ Q, we wish to compute a v ∈ D with minimum Hamming distance from u. We solve this problem in n 2−1/O(c(n) log 2 c(n)) randomized time. Hence, the problem is in "truly subquadratic" time for O(log n) dimensions, and in subquadratic time for d = o((log 2 n)/(log log n) 2 ). We apply the algorithm to computing pairs with maximum inner product, closest pair in ℓ 1 for vectors with bounded integer entries, and pairs with maximum Jaccard coefficients.
We describe reductions from the problem of determining the satisfiability of Boolean CNF formulas (CNF-SAT) to several natural algorithmic problems. We show that attaining any of the following bounds would improve the state of the art in algorithms for SAT:• a (computationally efficient) protocol for 3-party set disjointness with o(m) bits of communication,• an O(n 2−ε ) algorithm for 2-SAT with m = n 1+o (1) clauses, where two clauses may have unrestricted length, and• an O((n + m) k−ε ) algorithm for HornSat with k unrestricted length clauses.One may interpret our reductions as new attacks on the complexity of SAT, or sharp lower bounds conditional on exponential hardness of SAT.
The P vs NP problem arose from the question of whether exhaustive search is necessary for problems with short verifiable solutions. We do not know if even a slight algorithmic improvement over exhaustive search is universally possible for all NP problems, and to date no major consequences have been derived from the assumption that an improvement exists.We show that there are natural NP and BPP problems for which minor algorithmic improvements over the trivial deterministic simulation already entail lower bounds such as NEXP ⊆ P/poly and LOGSPACE = NP. These results are especially interesting given that similar improvements have been found for many other hard problems. Optimistically, one might hope our results suggest a new path to lower bounds; pessimistically, they show that carrying out the seemingly modest program of finding slightly better algorithms for all search problems may be extremely difficult (if not impossible).We also prove unconditional superpolynomial time-space lower bounds for improving on exhaustive search: there is a problem verifiable with k(n) length witnesses in O(n a ) time (for some a and some function k(n) ≤ n) that cannot be solved in k(n) c n a+o(1) time and k(n) c n o(1) space, for every c ≥ 1. While such problems can always be solved by exhaustive search in O(2 k(n) n a ) time and O(k(n) + n a ) space, we can prove a superpolynomial lower bound in the parameter k(n) when space usage is restricted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.