Predicting patients' risk of developing certain diseases is an important research topic in healthcare. Accurately identifying and ranking the similarity among patients based on their historical records is a key step in personalized healthcare. The electric health records (EHRs), which are irregularly sampled and have varied patient visit lengths, cannot be directly used to measure patient similarity due to the lack of an appropriate representation. Moreover, there needs an effective approach to measure patient similarity on EHRs. In this paper, we propose two novel deep similarity learning frameworks which simultaneously learn patient representations and measure pairwise similarity. We use a convolutional neural network (CNN) to capture local important information in EHRs and then feed the learned representation into triplet loss or softmax cross entropy loss. After training, we can obtain pairwise distances and similarity scores. Utilizing the similarity information, we then perform disease predictions and patient clustering. Experimental results show that CNN can better represent the longitudinal EHR sequences, and our proposed frameworks outperform state-of-the-art distance metric learning methods.
Pairwise learning has received much attention recently as it is more capable of modeling the relative relationship between pairs of samples. Many machine learning tasks can be categorized as pairwise learning, such as AUC maximization and metric learning. Existing techniques for pairwise learning all fail to take into consideration a critical issue in their design, i.e., the protection of sensitive information in the training set. Models learned by such algorithms can implicitly memorize the details of sensitive information, which offers opportunity for malicious parties to infer it from the learned models. To address this challenging issue, in this paper, we propose several differentially private pairwise learning algorithms for both online and offline settings. Specifically, for the online setting, we first introduce a differentially private algorithm (called OnPairStrC) for strongly convex loss functions. Then, we extend this algorithm to general convex loss functions and give another differentially private algorithm (called OnPairC). For the offline setting, we also present two differentially private algorithms (called OffPairStrC and OffPairC) for strongly and general convex loss functions, respectively. These proposed algorithms can not only learn the model effectively from the data but also provide strong privacy protection guarantee for sensitive information in the training set. Extensive experiments on real-world datasets are conducted to evaluate the proposed algorithms and the experimental results support our theoretical analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.