Despite advancements in vehicle security systems, over the last decade, auto-theft rates have increased, and cybersecurity attacks on internet-connected and autonomous vehicles are becoming a new threat. In this paper, a deep learning model is proposed, which can identify drivers from their driving behaviors based on vehicle telematics data. The proposed Long-Short-Term-Memory (LSTM) model predicts the identity of the driver based on the individual's unique driving patterns learned from the vehicle telematics data. Given the telematics is time-series data, the problem is formulated as a time series prediction task to exploit the embedded sequential information. The performance of the proposed approach is evaluated on three naturalistic driving datasets, which gives high accuracy prediction results.The robustness of the model on noisy and anomalous data that is usually caused by sensor defects or environmental factors is also investigated. Results show that the proposed model prediction accuracy remains satisfactory and outperforms the other approaches despite the extent of anomalies and noiseinduced in the data.
As an unsupervised learning technique, clustering can effectively capture the patterns in a data stream based on similarities among the data. Traditional data stream clustering algorithms either heavily depend on some prior knowledge or predefined parameters while the characteristics of real-time data are considered unknown. Besides, the user-specified threshold is used to overcome the effect of outliers and noises, which significantly affects the clustering performance. The overlap among clusters is another major challenge for the existing stream clustering methods. These constraints strongly limit their real-time applications. In this paper, a two-phase stream clustering algorithm based on fitness proportionate sharing is proposed. It handles streaming data when prior knowledge is not available and maps the clustering problem into a multimodal optimization problem. It introduces a density-based objective function and adopts the fitness proportionate sharing strategy to perform a more effective search for the cluster centers. To capture the dynamic characteristics of streaming data, a recursive formula for the lower bound of the density function is derived, and a summary of historical data is established for the proposed algorithm. The proposed algorithm is applied to different data sets, and a comprehensive comparison between the proposed algorithm and five well-known data stream clustering algorithms in the literature is provided. Results show comparable or better performance relative to five popular data stream clustering algorithms. A scalability analysis of the proposed streaming clustering method is presented in this paper as well.INDEX TERMS Data streams, clustering, unsupervised learning, data mining.
I. INTRODUCTION
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.