Data is the new oil for the car industry. Cars generate data about how they are used and who's behind the wheel which gives rise to a novel way of profiling individuals. Several prior works have successfully demonstrated the feasibility of driver re-identification using the in-vehicle network data captured on the vehicle's CAN (Controller Area Network) bus. However, all of them used signals (e.g., velocity, brake pedal or accelerator position) that have already been extracted from the CAN log which is itself not a straightforward process. Indeed, car manufacturers intentionally do not reveal the exact signal location within CAN logs. Nevertheless, we show that signals can be efficiently extracted from CAN logs using machine learning techniques. We exploit that signals have several distinguishing statistical features which can be learnt and effectively used to identify them across different vehicles, that is, to quasi "reverse-engineer" the CAN protocol. We also demonstrate that the extracted signals can be successfully used to re-identify individuals in a dataset of 33 drivers. Therefore, not revealing signal locations in CAN logs per se does not prevent them to be regarded as personal data of drivers.
Data generated by cars is growing at an unprecedented scale. As cars gradually become part of the Internet of Things (IoT) ecosystem, several stakeholders discover the value of in-vehicle network logs containing the measurements of the multitude of sensors deployed within the car. This wealth of data is also expected to be exploitable by third parties for the purpose of profiling drivers in order to provide personalized, valueadded services. Although several prior works have successfully demonstrated the feasibility of driver re-identification using the in-vehicle network data captured on the vehicle's CAN (Controller Area Network) bus, they inferred the identity of the driver only from known sensor signals (such as the vehicle's speed, brake pedal position, steering wheel angle, etc.) extracted from the CAN messages. However, car manufacturers intentionally do not reveal exact signal location and semantics within CAN logs. We show that the inference of driver identity is possible even with off-the-shelf machine learning techniques without reverse-engineering the CAN protocol. We demonstrate our approach on a dataset of 33 drivers and show that a driver can be re-identified and distinguished from other drivers with an accuracy of 75-85%. 2 bit.ly/2V14YkK 3 www.wsj.com/articles/what-your-car-knows-about-you-1534564861
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.