In data mining applications and spatial and multimedia databases, a useful tool is the kNN join, which is to produce the k nearest neighbors (NN), from a dataset S, of every point in a dataset R. Since it involves both the join and the NN search, performing kNN joins efficiently is a challenging task. Meanwhile, applications continue to witness a quick (exponential in some cases) increase in the amount of data to be processed. A popular model nowadays for large-scale data processing is the shared-nothing cluster on a number of commodity machines using MapReduce [6]. Hence, how to execute kNN joins efficiently on large data that are stored in a MapReduce cluster is an intriguing problem that meets many practical needs. This work proposes novel (exact and approximate) algorithms in MapReduce to perform efficient parallel kNN joins on large data. We demonstrate our ideas using Hadoop. Extensive experiments in large real and synthetic datasets, with tens or hundreds of millions of records in both R and S and up to 30 dimensions, have demonstrated the efficiency, effectiveness, and scalability of our methods.
Due to massive amounts of data distributed across multiple locations, distributed machine learning has attracted a lot of research interests. Alternating Direction Method of Multipliers (ADMM) is a powerful method of designing distributed machine learning algorithm, whereby each agent computes over local datasets and exchanges computation results with its neighbor agents in an iterative procedure. There exists significant privacy leakage during this iterative process if the local data is sensitive. In this paper, we propose a differentially private ADMM algorithm (P-ADMM) to provide dynamic zeroconcentrated differential privacy (dynamic zCDP), by inserting Gaussian noise with linearly decaying variance. We prove that P-ADMM has the same convergence rate compared to the nonprivate counterpart, i.e., O(1/K) with K being the number of iterations and linear convergence for general convex and strongly convex problems while providing differentially private guarantee. Moreover, through our experiments performed on real-world datasets, we empirically show that P-ADMM has the best-known performance among the existing differentially private ADMM based algorithms.• We propose a differentially private ADMM algorithm (P-ADMM) by introducing a Gaussian noise with a linearly decaying variance to address the privacy concerns in distributed machine learning over large datasets. • We introduce a new privacy framework to quantify the privacy leakage in a distributed and iterative setting,
An imprecise probabilistic framework for design flood estimation is proposed on the basis of the Dempster-Shafer theory to handle different epistemic uncertainties from data, probability distribution functions, and probability distribution parameters. These uncertainties are incorporated in cost-benefit analysis to generate the lower and upper bounds of the total cost for flood control, thus presenting improved information for decision making on design floods. Within the total cost bounds, a new robustness criterion is proposed to select a design flood that can tolerate higher levels of uncertainty. A variance decomposition approach is used to quantify individual and interactive impacts of the uncertainty sources on total cost. Results from three case studies, with 127, 104, and 54 year flood data sets, respectively, show that the imprecise probabilistic approach effectively combines aleatory and epistemic uncertainties from the various sources and provides upper and lower bounds of the total cost. Between the total cost and the robustness of design floods, a clear trade-off which is beyond the information that can be provided by the conventional minimum cost criterion is identified. The interactions among data, distributions, and parameters have a much higher contribution than parameters to the estimate of the total cost. It is found that the contributions of the various uncertainty sources and their interactions vary with different flood magnitude, but remain roughly the same with different return periods. This study demonstrates that the proposed methodology can effectively incorporate epistemic uncertainties in cost-benefit analysis of design floods.
Fast detection of pipe burst in water distribution systems (WDSs) could improve customer satisfaction, increase the profits of water supply and more importantly reduce the loss of water resources. Therefore, sensor placement for pipe burst detection in WDSs has been a crucial issue for researchers and practitioners. This paper presents an economic evaluation indicator named as net cost based on cost–benefit analysis to solve the optimal pressure sensor placement problem. The net cost is defined as the sum of the normalized optimal detection uncovering rate and investment cost of sensors. The optimal detection uncovering rate and the optimal set of sensor locations are determined through a single-objective optimization model that maximizes the detection coverage rate under a fixed number of sensors. The optimal number of sensors is then determined by analyzing the relationship between the net cost and the number of sensors. The proposed method is demonstrated to be effective in determining both the optimal number of sensors and their locations on a benchmark network Net3. Moreover, the sensor accuracy and pipe burst flow magnitude are shown to be key uncertainties in determining the optimal number of sensors.
A novel photoacoustic thermometric method is presented for simultaneously imaging cells and sensing their temperature. With three-seconds-per-frame imaging speed, a temperature resolution of 0.2°C was achieved in a photo-thermal cell heating experiment. Compared to other approaches, the photoacoustic thermometric method has the advantage of not requiring custom-developed temperature-sensitive biosensors. This feature should facilitate the conversion of single-cell thermometry into a routine lab tool and make it accessible to a much broader biological research community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.