With the widespread use of mobile devices, location-based services (LBSs), which provide useful services adjusted to users’ locations, have become indispensable to our daily lives. However, along with several benefits, LBSs also create problems for users because to use LBSs, users are required to disclose their sensitive location information to the service providers. Hence, several studies have focused on protecting the location privacy of individual users when using LBSs. Geo-indistinguishability (Geo-I), which is based on the well-known differential privacy, has recently emerged as a de-facto privacy definition for the protection of location data in LBSs. However, LBS providers require aggregate statistics, such as user density distribution, for the purpose of improving their service quality, and deriving them accurately from the location dataset received from users is difficult owing to the data perturbation of Geo-I. Thus, in this study, we investigated two different approaches, the expectation-maximization (EM) algorithm and the deep learning based approaches, with the aim of precisely computing the density distribution of LBS users while preserving the privacy of location datasets. The evaluation results show that the deep learning approach significantly outperforms other alternatives at all privacy protection levels. Furthermore, when a low level of privacy protection is sufficient, the approach based on the EM algorithm shows performance results similar to those of the deep learning solution. Thus, it can be used instead of a deep learning approach, particularly when training datasets are not available.
Deep learning has progressively been the spotlight of innovations that aim to leverage the clinical time-series data that are longitudinally recorded in the Electronic Health Records(EHR) to forecast the patient's survival and vital signs deterioration. However, their recording velocity, as well as their noisiness, hinder the proper adoption of the recently proposed benchmarks. The Recurrent Neural Networks (RNN) especially the Long-short Term Memory (LSTMs) have achieved better results in recent studies but they are hard to train and interpret and fail to properly capture the long-term dependencies. Moreover, the RNNs suffer greatly with clinical time series due to their sequential processing which cripples the prospect of parallel processing. Recently the Transformer approach was proposed for Natural Language Processing (NLP) tasks and achieved state-of-the-art results. Hence to tackle the drawbacks that are suffered by the RNNs we propose a clinical time series Multi-head Transformer (MHT), which is a transformerbased model that forecasts the patient's future time series variables using the vitals signs. To prove the generalization of the model we use the same model for other critical tasks that describe the Intensive Care Unit (ICU) patient's progression and the associated risks like the remaining Length Of Stay(LoS), the Inhospital Mortality as well as the 24 hours mortality. Our model achieves an Area Under The Curve-Receiver Operating Characteristics( AUC-ROC) of 0.98 and an Area Under the Curve, Precision-Recall (AUC-PR) of 0.424 for vital time series prediction, and an AUC-ROC of 0.875 in the mortality prediction. The model performs well for the frequently recorded variables like the Heart Rate (HR) and performs barely like the LSTM counterparts for the intermittently captured records such as the White Blood Count (WBC).
Due to privacy concerns, multi-party gradient tree boosting algorithms have become widely popular amongst machine learning researchers and practitioners. However, limited existing works have focused on vertically partitioned datasets, and the few existing works are either not scalable or tend to leak information. Thus, in this work, we propose SSXGB, which is a scalable and acceptably secure multi-party gradient tree boosting framework for vertically partitioned datasets with partially outsourced computations. Specifically, we employ an additive homomorphic encryption (HE) scheme for security. We design two sub-protocols based on the HE scheme to perform non-linear operations associated with gradient tree boosting algorithms. Next, we propose secure training and prediction algorithms under the SSXGB framework. Then, we provide theoretical security and communication analysis for the proposed framework. Finally, we evaluate the performance of the framework with experiments using two real-world datasets.
BACKGROUND Each year, influenza affects 3 to 5 million people and causes 290,000 to 650,000 fatalities worldwide. To reduce the fatalities caused by influenza, several countries have established influenza surveillance systems to collect early-warning data. However, proper and timely warnings are hindered by a 1 to 2 weeks delay between the actual disease outbreaks and the publication of surveillance data. To avoid this delay of traditional monitoring methods, novel methods have been proposed for influenza surveillance and prediction by using real-time internet data (such as search queries, microblogging, and news). Some of the currently popular approaches extract online data and use machine learning to predict influenza occurrences in a classification mode. However, many of these methods extract training data subjectively, and it is difficult to capture the latent characteristics of the data correctly. There is a critical need to devise new approaches that focus on extracting training data by reflecting the latent characteristics of the data. OBJECTIVE In this paper, we propose an effective training data extraction method that reflects the hidden features and improves the performance by filtering and selecting only the keywords related to influenza before the prediction. METHODS Although the word embeddings provide a distributed representation of words by encoding the hidden relationships between various tokens, we enhance the word embeddings by selecting keywords related to the influenza outbreak and sorting the extracted keywords using the Pearson correlation coefficient (PCC) in order of correlation with the influenza outbreak. The keyword extraction process is followed by a predictive model based on long short-term memory (LSTM) that predicts the influenza outbreak. To assess the performance of the proposed predictive model, we use and compare a variety of word embeddings. RESULTS Word embeddings without our proposed sorting process showed 0.8705 prediction accuracy when 50.2 keywords were selected on average. On the other hand, word embeddings using our proposed sorting process showed 0.8868 prediction accuracy and 12.6% prediction accuracy improvement although smaller amount of training data are selected with only 20.6 keywords on average. CONCLUSIONS The sorting process empowers the embedding process, which improves the feature extraction process because it acts as a knowledge base for the prediction component. The model outperforms other current approaches that use flat extraction before prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.