One challenge in using naturalistic driving data is producing a holistic analysis of these highly variable datasets. Typical analyses focus on isolated events, such as large g-force accelerations indicating a possible near-crash. Examining isolated events is ill-suited for identifying patterns in continuous activities such as maintaining vehicle control. We present an alternative approach that converts driving data into a text representation and uses topic modeling to identify patterns across the dataset. This approach enables the discovery of non-linear patterns, reduces the dimensionality of the data, and captures subtle variations in driver behavior. In this study topic models are used to concisely described patterns in trips from drivers with and without untreated obstructive sleep apnea (OSA). The analysis included 5000 trips (50 trips from 100 drivers; 66 drivers with OSA; 34 comparison drivers). Trips were treated as documents, and speed and acceleration data from the trips were converted to “driving words.” The identified patterns, called topics, were determined based on regularities in the co-occurrence of the driving words within the trips. This representation was used in random forest models to predict the driver condition (i.e., OSA or comparison) for each trip. Models with 10, 15 and 20 topics had better accuracy in predicting the driver condition, with a maximum AUC of 0.73 for a model with 20 topics. Trips from drivers with OSA were more likely to be defined by topics for smaller lateral accelerations at low speeds. The results demonstrate topic modeling as a useful tool for extracting meaningful information from naturalistic driving datasets.
This paper introduces Probabilistic Topic Modeling (PTM) as a promising approach to naturalistic driving data analyses. Naturalistic driving data present an unprecedented opportunity to understand driver behavior. Novel strategies are needed to achieve a more complete picture of these datasets than is provided by the local event-based analytic strategy that currently dominates the field. PTM is a text analysis method for uncovering word-based themes across documents. In this application, documents were represented by drives and words were created from speed and acceleration data using Symbolic Aggregate approximation (SAX). A twenty-topic Latent Dirichlet Allocation (LDA) topic model was developed using words from 10,705 documents (real-world drives) by 26 drivers. The resulting LDA model clustered the drives into meaningful topics. Topic membership probabilities were successfully used as features in subsequent analyses to differentiate between healthy drivers and those suffering from Obstructive Sleep Apnea.
Summary:Drivers' expectations influence their responses to events in complex ways. In particular, covert and sustained hazards, like crosswinds, might require complex vehicle control adaptations. We investigated differences between drivers' lateral responses in unexpected and expected (repeated) crosswind events using probabilistic topic modeling. First, each driver's event-based steering wheel movements (angle and rate, 5 Hz) were transformed into symbolic words. Then, probabilistic topic modeling was used to discover patterns in the steering wheel movement data across the event conditions. Results indicate that drivers may make fewer abrupt steering wheel movements when they encounter unexpected crosswinds. On the contrary, drivers are more likely to make continuous faster steering corrections to compensate crosswinds when they are expected. The topic models also classify unexpected and expected crosswind events better than traditional models that use single aggregated values across events (maximum steering wheel angle and rate). These preliminary insights show an advantage for granular, time-series based analysis of driving data, and suggest a viable machinelearning based technique to conduct such investigations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.