The classical Direct-Product Theorem for circuits says that if a Boolean function f : {0, 1} n → {0, 1} is somewhat hard to compute on average by small circuits, then the corresponding k-wisewhere each x i ∈ {0, 1} n ) is significantly harder to compute on average by slightly smaller circuits. We prove a fully uniform version of the Direct-Product Theorem with information-theoretically optimal parameters, up to constant factors. Namely, we show that for given k and , there is an efficient randomized algorithm A with the following property. Given a circuit C that computes f k on at least fraction of inputs, the algorithm A outputs with probability at least 3/4 a list of O(1/ ) circuits such that at least one of the circuits on the list computes f on more than 1 − δ fraction of inputs, for δ = O((log 1/ )/k); moreover, each output circuit is an AC 0 circuit (of size poly(n, k, log 1/δ, 1/ )), with oracle access to the circuit C. Using the Goldreich-Levin decoding algorithm [GL89], we also get a fully uniform version of Yao's XOR Lemma [Yao82] with optimal parameters, up to constant factors. Our results simplify and improve those in [IJK06].Our main result may be viewed as an efficient approximate, local, list-decoding algorithm for direct-product codes (encoding a function by its values on all k-tuples) with optimal parameters. We generalize it to a family of "derandomized" direct-product codes, which we call intersection codes, where the encoding provides values of the function only on a subfamily of k-tuples. The quality of the decoding algorithm is then determined by sampling properties of the sets in this family and their intersections. As a direct consequence of this generalization we obtain the first derandomized direct product result in the uniform setting, allowing hardness amplification with only constant (as opposed to a factor of k) increase in the input length. Finally, this general setting naturally allows the decoding of concatenated codes, which further yields nearly optimal derandomized amplification.
We show that any distribution on {−1, +1} n that is k-wise independent fools any halfspace (a.k.a. linear threshold function) h :where the w 1 , . . . , w n , θ are arbitrary real numbers, with error for k = O( −2 log 2 (1/ )). Our result is tight up to log(1/ ) factors. Using standard constructions of k-wise independent distributions, we obtain the first explicit pseudorandom generators G : {−1, +1} s → {−1, +1} n that fool halfspaces. Specifically, we fool halfspaces with error and seed length s = k · log n = O(log n · −2 log 2 (1/ )).Our approach combines classical tools from real approximation theory with structural results on halfspaces by Servedio (Comput. Complexity 2007).
Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for clustering (SSAC) where a learner is allowed to make samecluster queries. More specifically, in their model, there is a query oracle that answers queries of the form "given any two vertices, do they belong to the same optimal cluster?". In many clustering contexts, this kind of oracle queries are feasible. Ashtiani et al. showed the usefulness of such a query framework by giving a polynomial time algorithm for the k-means clustering problem where the input dataset satisfies some separation condition. Ailon et al. extended the above work to the approximation setting by giving an efficient (1+ε)-approximation algorithm for k-means for any small ε > 0 and any dataset within the SSAC framework. In this work, we extend this line of study to the correlation clustering problem. Correlation clustering is a graph clustering problem where pairwise similarity (or dissimilarity) information is given for every pair of vertices and the objective is to partition the vertices into clusters that minimise the disagreement (or maximises agreement) with the pairwise information given as input. These problems are popularly known as MinDisAgree and MaxAgree problems, and MinDisAgree[k] and MaxAgree [k] are versions of these problems where the number of optimal clusters is at most k. There exist Polynomial Time Approximation Schemes (PTAS) for MinDisAgree [k] and MaxAgree [k] where the approximation guarantee is (1 + ε) for any small ε and the running time is polynomial in the input parameters but exponential in k and 1/ε. We get a significant running time improvement within the SSAC framework at the cost of making a small number of same-cluster queries. We obtain an (1+ε)-approximation algorithm for any small ε with running time that is polynomial in the input parameters and also in k and 1/ε. We also give non-trivial upper and lower bounds on the number of same-cluster queries, the lower bound being based on the Exponential Time Hypothesis (ETH). Note that the existence of an efficient algorithm for MinDisAgree[k] in the SSAC setting exhibits the power of same-cluster queries since such polynomial time algorithm (polynomial even in k and 1/ε) is not possible in the classical ⋆ Nir Ailon acknowledges the generous support of ISF grant number 2021408 ⋆⋆ Anup Bhattacharya acknowledges the support of TCS fellowship at IIT Delhi. ⋆ ⋆ ⋆ Ragesh Jaiswal acknowledges the support of ISF-UGC India-Israel Grant 2014. † Email address: nailon@cs.technion.ac.il ‡ Email addresses: {anupb, rjaiswal}@cse.iitd.ac.in (non-query) setting due to our conditional lower bounds. Our conditional lower bound is particularly interesting as it not only establishes a lower bound on the number of same cluster queries in the SSAC framework but also establishes a conditional lower bound on the running time of any (1 + ε)-approximation algorithm for MinDisAgree [k].
Abstract. The classical center based clustering problems such as k-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. For instance, consider the r-gather clustering problem where there is an additional constraint that each of the clusters should have at least r points or the capacitated clustering problem where there is an upper bound on the cluster sizes. Consider a variant of the k-means problem that may be regarded as a general version of such problems. Here, the optimal clusters O1, ..., O k are an arbitrary partition of the dataset and the goal is to output k-centers c1, ..., c k such that the objective functionIt is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of k centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such k centers such that at least one of these k centers behaves well. Given an error parameter ε > 0, let ℓ denote the size of the smallest list of k-centers such that at least one of the k-centers gives a (1 + ε) approximation w.r.t. the objective function above. In this paper, we show an upper bound on ℓ by giving a randomized algorithm that outputs a list of 2Õ (k/ε) k-centers 1 . We also give a closely matching lower bound of 2Ω. This is a significant improvement over the previous result of Ding and Xu [DX15] who gave an algorithm with running time O nd · (log n)and output a list of size O (log n) k · 2 poly(k/ε) . Our techniques generalize for the k-median problem and for many other settings where non-Euclidean distance measures are involved.
Consider a challenge-response protocol where the probability of a correct response is at least α for a legitimate user and at most β < α for an attacker. One example is a CAPTCHA challenge, where a human should have a significantly higher chance of answering a single challenge (e.g., uncovering a distorted letter) than an attacker; another example is an argument system without perfect completeness. A natural approach to boost the gap between legitimate users and attackers is to issue many challenges and accept if the response is correct for more than a threshold fraction, for the threshold chosen between α and β. We give the first proof that parallel repetition with thresholds improves the security of such protocols. We do this with a very general result about an attacker's ability to solve a large fraction of many independent instances of a hard problem, showing a Chernoff-like convergence of the fraction solved incorrectly to the probability of failure for a single instance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.