Publicly accessible databases are an indispensable resource for retrieving up-to-date information. But they also pose a significant risk to the privacy of the user, since a curious database operator can follow the user's queries and infer what the user is after. Indeed, in cases where the users' intentions are to be kept secret, users are often cautious about accessing the database. It can be shown that when accessing a single database, to completely guarantee the privacy of the user, the whole database should be down-loaded; namely
n
bits should be communicated (where
n
is the number of bits in the database).
In this work, we investigate whether by replicating the database, more efficient solutions to the private retrieval problem can be obtained. We describe schemes that enable a user to access
k
replicated copies of a database (
k
≥2) and
privately
retrieve information stored in the database. This means that each individual server (holding a replicated copy of the database) gets no information on the identity of the item retrieved by the user. Our schemes use the replication to gain substantial saving. In particular, we present a two-server scheme with communication complexity
O(n
1/3
).
This paper concerns the discovery of patterns in gene expression matrices, in which each element gives the expression level of a given gene in a given experiment. Most existing methods for pattern discovery in such matrices are based on clustering genes by comparing their expression levels in all experiments, or clustering experiments by comparing their expression levels for all genes. Our work goes beyond such global approaches by looking for local patterns that manifest themselves when we focus simultaneously on a subset of the genes and a subset Ì of the experiments. Specifically, we look for order-preserving submatrices (OPSMs), in which the expression levels of all genes induce the same linear ordering of the experiments (we show that the OPSM search problem is NP-hard in the worst case). Such a pattern might arise, for example, if the experiments in Ì represent distinct stages in the progress of a disease or in a cellular process, and the expression levels of all genes in vary across the stages in the same way.We define a probabilistic model in which an OPSM is hidden within an otherwise random matrix. Guided by this model we develop an efficient algorithm for finding the hidden OPSM in the random matrix. In data generated according to the model the algorithm recovers the hidden OPSM with very high success rate. Application of the methods to breast cancer data seem to reveal significant local patterns. zohar yakhini@agilent.com.Our algorithm can be used to discover more than one OPSM within the same data set, even when these OPSMs overlap. It can also be adapted to handle relaxations and extensions of the OPSM condition. For example, we may allow the different rows of ¢ Ì to induce similar but not identical orderings of the columns, or we may allow the set Ì to include more than one representative of each stage of a biological process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.