Abstract-Professional use of cloud health storage around the world implies Information-Retrieval extensions. These developments should help users find what they need among thousands or billions of enterprise documents and reports. However, extensions must offer protection against existing threats, for instance, hackers, server administrators and service providers who use people's personal data for their own purposes. Indeed, cloud servers maintain traces of user activities and queries, which compromise user security against network hackers. Even cloud servers can use those traces to adapt or personalize their platforms without users' agreements. For this purpose, we suggest implementing Private Information Retrieval (PIR) protocols to ease the retrieval task and secure it from both servers and hackers. We study the effectiveness of this solution through an evaluation of information retrieval time, recall and precision. The experimental results show that our framework ensures a reasonable and acceptable level of confidentiality for retrieval of data through cloud services.
Genomic repeats, i.e., pattern searching in the string processing process to find repeated base pairs in the order of Deoxyribonucleic Acid (DNA), requires a long processing time. This research builds a big-data computational model to look for patterns in strings by modifying and implementing the Boyer-Moore algorithm on Apache Spark Streaming for human DNA sequences from the Ensemble site. Moreover, we perform some experiments on cloud computing by varying different specifications of computer clusters with involving datasets of human DNA sequences. The results obtained show that the proposed computational model on Apache Spark Streaming is f aster than standalone computing and parallel computing with multicore. Therefore, it can be stated that the main contribution in this research, which is to develop a computational model for reducing the computational costs, has been achieved.
Motif discovery in DNA sequences is one of the most important issues in bioinformatics. Thus, algorithms for dealing with the problem accurately and quickly have always been the goal of research in bioinformatics. Therefore, this study is intended to modify the random projection algorithm to be implemented on R high performance computing (i.e., the R package pbdMPI). Some steps are needed to achieve this objective, ie preprocessing data, splitting data according to number of batches, modifying and implementing random projection in the pbdMPI package, and then aggregating the results. To validate the proposed approach, some experiments have been conducted. Several benchmarking data were used in this study by sensitivity analysis on number of cores and batches. Experimental results show that computational cost can be reduced, which is that the computation cost of 6 cores is faster around 34 times compared with the standalone mode. Thus, the proposed approach can be used for motif discovery effectively and efficiently.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.