Abstract. In many fields of research and business data sizes are breaking the petabyte barrier. This imposes new problems and research possibilities for the database community. Usually, data of this size is stored in large clusters or clouds. Although clouds have become very popular in recent years, there is only little work on benchmarking cloud applications. In this paper we present a data generator for cloud sized applications. Its architecture makes the data generator easy to extend and to configure. A key feature is the high degree of parallelism that allows linear scaling for arbitrary numbers of nodes. We show how distributions, relationships and dependencies in data can be computed in parallel with linear speed up.
Recent progress in computer vision has been dominated by deep neural networks trained over large amounts of labeled data. Collecting such datasets is however a tedious, often impossible task; hence a surge in approaches relying solely on synthetic data for their training. For depth images however, discrepancies with real scans still noticeably affect the end performance. We thus propose an end-to-end framework which simulates the whole mechanism of these devices, generating realistic depth data from 3D models by comprehensively modeling vital factors e.g. sensor noise, material reflectance, surface geometry. Not only does our solution cover a wider range of sensors and achieve more realistic results than previous methods, assessed through extended evaluation, but we go further by measuring the impact on the training of neural networks for various recognition tasks; demonstrating how our pipeline seamlessly integrates such architectures and consistently enhances their performance.
Accounting for the large number of queries sent by users to search engines on a daily basis, the latter are likely to learn and possibly leak sensitive information about individual users. To deal with this issue, several solutions have been proposed to query search engines in a privacy preserving way. A first category of solutions aim to hide users' identities, thus enforcing unlinkability between a query and the identity of its originating user. A second category of approaches aims to obfuscate the content of users' queries, or at generating fake queries in order to blur user profiles, thus enforcing indistinguishability between them. In this paper we propose PEAS, a new protocol for private Web search. PEAS combines a new efficient unlinkability protocol with a new accurate indistinguishability protocol. Experiments conducted using a real dataset of search logs show that compared to state-of-the-art approaches, PEAS decreases by up to 81.9% the number of queries linked to their original requesters. Furthermore, PEAS is accurate as it allows users to retrieve up to 95.3% of the results they would obtain using search engines in an unprotected way.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.