Chi Zhang scite author profile

In data mining applications and spatial and multimedia databases, a useful tool is the kNN join, which is to produce the k nearest neighbors (NN), from a dataset S, of every point in a dataset R. Since it involves both the join and the NN search, performing kNN joins efficiently is a challenging task. Meanwhile, applications continue to witness a quick (exponential in some cases) increase in the amount of data to be processed. A popular model nowadays for large-scale data processing is the shared-nothing cluster on a number of commodity machines using MapReduce [6]. Hence, how to execute kNN joins efficiently on large data that are stored in a MapReduce cluster is an intriguing problem that meets many practical needs. This work proposes novel (exact and approximate) algorithms in MapReduce to perform efficient parallel kNN joins on large data. We demonstrate our ideas using Hadoop. Extensive experiments in large real and synthetic datasets, with tens or hundreds of millions of records in both R and S and up to 30 dimensions, have demonstrated the efficiency, effectiveness, and scalability of our methods.

show abstract

Differentially Private Robust ADMM for Distributed Machine Learning

Ding

Zhang

Chen

et al. 2019

View full text Add to dashboard Cite

Due to massive amounts of data distributed across multiple locations, distributed machine learning has attracted a lot of research interests. Alternating Direction Method of Multipliers (ADMM) is a powerful method of designing distributed machine learning algorithm, whereby each agent computes over local datasets and exchanges computation results with its neighbor agents in an iterative procedure. There exists significant privacy leakage during this iterative process if the local data is sensitive. In this paper, we propose a differentially private ADMM algorithm (P-ADMM) to provide dynamic zeroconcentrated differential privacy (dynamic zCDP), by inserting Gaussian noise with linearly decaying variance. We prove that P-ADMM has the same convergence rate compared to the nonprivate counterpart, i.e., O(1/K) with K being the number of iterations and linear convergence for general convex and strongly convex problems while providing differentially private guarantee. Moreover, through our experiments performed on real-world datasets, we empirically show that P-ADMM has the best-known performance among the existing differentially private ADMM based algorithms.• We propose a differentially private ADMM algorithm (P-ADMM) by introducing a Gaussian noise with a linearly decaying variance to address the privacy concerns in distributed machine learning over large datasets. • We introduce a new privacy framework to quantify the privacy leakage in a distributed and iterative setting,

show abstract

Imprecise probabilistic estimation of design floods with epistemic uncertainties

Zhang

et al. 2016

Water Resources Research

View full text Add to dashboard Cite

An imprecise probabilistic framework for design flood estimation is proposed on the basis of the Dempster-Shafer theory to handle different epistemic uncertainties from data, probability distribution functions, and probability distribution parameters. These uncertainties are incorporated in cost-benefit analysis to generate the lower and upper bounds of the total cost for flood control, thus presenting improved information for decision making on design floods. Within the total cost bounds, a new robustness criterion is proposed to select a design flood that can tolerate higher levels of uncertainty. A variance decomposition approach is used to quantify individual and interactive impacts of the uncertainty sources on total cost. Results from three case studies, with 127, 104, and 54 year flood data sets, respectively, show that the imprecise probabilistic approach effectively combines aleatory and epistemic uncertainties from the various sources and provides upper and lower bounds of the total cost. Between the total cost and the robustness of design floods, a clear trade-off which is beyond the information that can be provided by the conventional minimum cost criterion is identified. The interactions among data, distributions, and parameters have a much higher contribution than parameters to the estimate of the total cost. It is found that the contributions of the various uncertainty sources and their interactions vary with different flood magnitude, but remain roughly the same with different return periods. This study demonstrates that the proposed methodology can effectively incorporate epistemic uncertainties in cost-benefit analysis of design floods.

show abstract

Optimal sensor placement for pipe burst detection in water distribution systems using cost–benefit analysis

Zhao

Zhang

Liu

et al. 2020

View full text Add to dashboard Cite

Fast detection of pipe burst in water distribution systems (WDSs) could improve customer satisfaction, increase the profits of water supply and more importantly reduce the loss of water resources. Therefore, sensor placement for pipe burst detection in WDSs has been a crucial issue for researchers and practitioners. This paper presents an economic evaluation indicator named as net cost based on cost–benefit analysis to solve the optimal pressure sensor placement problem. The net cost is defined as the sum of the normalized optimal detection uncovering rate and investment cost of sensors. The optimal detection uncovering rate and the optimal set of sensor locations are determined through a single-objective optimization model that maximizes the detection coverage rate under a fixed number of sensors. The optimal number of sensors is then determined by analyzing the relationship between the net cost and the number of sensors. The proposed method is demonstrated to be effective in determining both the optimal number of sensors and their locations on a benchmark network Net3. Moreover, the sensor accuracy and pipe burst flow magnitude are shown to be key uncertainties in determining the optimal number of sensors.

show abstract

Single-cell photoacoustic thermometry

Gao

Wang

et al. 2013

View full text Add to dashboard Cite

A novel photoacoustic thermometric method is presented for simultaneously imaging cells and sensing their temperature. With three-seconds-per-frame imaging speed, a temperature resolution of 0.2°C was achieved in a photo-thermal cell heating experiment. Compared to other approaches, the photoacoustic thermometric method has the advantage of not requiring custom-developed temperature-sensitive biosensors. This feature should facilitate the conversion of single-cell thermometry into a routine lab tool and make it accessible to a much broader biological research community.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chi Zhang

Efficient parallel kNN joins for large data in MapReduce

Differentially Private Robust ADMM for Distributed Machine Learning

Imprecise probabilistic estimation of design floods with epistemic uncertainties

Optimal sensor placement for pipe burst detection in water distribution systems using cost–benefit analysis

Single-cell photoacoustic thermometry

Contact Info

Product

Resources

About