A number of recent works have studied algorithms for entrywise p-low rank approximation, namely algorithms which given an n×d matrix A (with n ≥ d), output a rank-k matrix B minimizing A − B p p = i,j |Ai,j − Bi,j| p when p > 0; and A − B 0 = i,j [Ai,j = Bi,j] for p = 0, where [·] is the Iverson bracket, that is, A − B 0 denotes the number of entries (i, j) for which Ai,j = Bi,j. For p = 1, this is often considered more robust than the SVD, while for p = 0 this corresponds to minimizing the number of disagreements, or robust PCA. This problem is known to be NP-hard for p ∈ {0, 1}, already for k = 1, and while there are polynomial time approximation algorithms, their approximation factor is at best poly(k). It was left open if there was a polynomial-time approximation scheme (PTAS) for p-approximation for any p ≥ 0. We show the following:1. On the algorithmic side, for p ∈ (0, 2), we give the first n poly(k/ε) time (1 + ε)approximation algorithm. For p = 0, there are various problem formulations, a common one being the binary setting in which A ∈ {0, 1} n×d and B = U · V , where U ∈ {0, 1} n×k and V ∈ {0, 1} k×d . There are also various notions of multiplication U · V , such as a matrix product over the reals, over a finite field, or over a Boolean semiring. We give the first almost-linear time approximation scheme for what we call the Generalized Binary 0-Rank-k problem, for which these variants are special cases. Our algorithm computes (1 + ε)-approximation in time (1/ε) 2 O(k) /ε 2 · nd 1+o(1) , where o(1) hides a factor (log log d) 1.1 / log d. In addition, for the case of finite fields of constant size, we obtain an alternate PTAS running in time n · d poly(k/ε) . Definition 2. (Generalized Binary 0 -Rank-k) Given a matrix A ∈ {0, 1} n×d with n ≥ d, an integer k, and an inner product function ., . :Our first result for p = 0 is as follows.Theorem 2 (PTAS for p = 0). For any ε ∈ (0, 1 2 ), there is a (1+ε)-approximation algorithm for the Generalized Binary 0 -Rank-k problem running in time (1/ε) 2 O(k) /ε 2 · nd 1+o(1) and succeeds with constant probability 1 , where o(1) hides a factor (log log d)Hence, we obtain the first almost-linear time approximation scheme for the Generalized Binary 0 -Rank-k problem, for any constant k. In particular, this yields the first polynomial time (1+ε)-approximation for constant k for 0 -low rank approximation of binary matrices when the underlying field is F 2 or the Boolean semiring. Even for k = 1, no PTAS was known before.Theorem 2 is doubly-exponential in k, and we show below that this is necessary for any approximation algorithm for Generalized Binary 0 -Rank-k. However, in the special case when the base field is F 2 , or more generally F q and A, U, and V have entries belonging to F q , it is possible to obtain an algorithm running in time n·d poly(k/ε) , which is an improvement for certain super-constant values of k and ε. We formally define the problem and state our result next. Definition 3. (Entrywise 0 -Rank-k Approximation over F q ) Given an n × d matrix A with e...
Population recovery is the problem of learning an unknown distribution over an unknown set of n-bit strings, given access to independent draws from the distribution that have been independently corrupted according to some noise channel. Recent work has intensively studied such problems both for the bit-flip noise channel and for the erasure noise channel.In this paper we initiate the study of population recovery under the deletion channel, in which each bit b is independently deleted with some fixed probability and the surviving bits are concatenated and transmitted. This is a far more challenging noise model than bit-flip noise or erasure noise; indeed, even the simplest case in which the population is of size 1 (corresponding to a trivial probability distribution supported on a single string) corresponds to the trace reconstruction problem, which is a challenging problem that has received much recent attention (see e.g. [DOS17a, NP17, PZ17, HPP18, HHP18]).In this work we give algorithms and lower bounds for population recovery under the deletion channel when the population size is some value > 1. As our main sample complexity upper bound, we show that for any population size = o(log n/ log log n), a population of strings from {0, 1} n can be learned under deletion channel noise using 2 n 1/2+o(1) samples. On the lower bound side, we show that at least n Ω( ) samples are required to perform population recovery under the deletion channel when the population size is , for all ≤ n 1/2−ε .Our upper bounds are obtained via a robust multivariate generalization of a polynomialbased analysis, due to Krasikov and Roddity [KR97], of how the k-deck of a bit-string uniquely identifies the string; this is a very different approach from recent algorithms for trace reconstruction (the = 1 case). Our lower bounds build on moment-matching results of Roos [Roo00] and Daskalakis and Papadimitriou [DP15].
A number of recent works have considered the trace reconstruction problem, in which an unknown source string x ∈ {0, 1} n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the asymptotically best algorithms known for worst-case strings use exp(O(n 1/3 )) traces [DOS17a, NP17], several highly efficient algorithms are known [PZ17, HPP18] for the average-case version of the problem, in which the source string x is chosen uniformly at random from {0, 1} n . In this paper we consider a generalization of the above-described averagecase trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, rather than a single unknown source string there is an unknown distribution over s unknown source strings x 1 , . . . , x s ∈ {0, 1} n , and each sample given to the algorithm is independently generated by drawing some x i from this distribution and outputting an independent trace of x i .Building on the results of [PZ17] and [HPP18], we give an efficient algorithm for the average-case population recovery problem in the presence of insertions and deletions. For any support size 1 ≤ s ≤ exp(Θ(n 1/3 )), for a 1 − o(1) fraction of all s-element support sets {x 1 , . . . , x s } ⊂ {0, 1} n , for every distribution D supported on {x 1 , . . . , x s }, our algorithm can efficiently recover D up to total variation distance at most ε with high probability, given access to independent traces of independent draws from D as described above. The running time of our algorithm is poly(n, s, 1/ε) and its sample complexity is poly(s, 1/ε, exp(log 1/3 n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version of the problem (when x 1 , . . . , x s may be any strings in {0, 1} n ), in which the sample complexity of the most efficient known algorithm [BCF + 19] is doubly exponential in s.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.