Spatiotemporal Feature Enhancement Aids the Driving Intention Inference of Intelligent Vehicles

Chen, Huiqin; Chen, Hailong; Líu, Hao; Feng, Xiexing

doi:10.3390/ijerph191811819

Cited by 4 publications

(8 citation statements)

References 46 publications

(62 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some researchers have utilized 3D Conv to model the temporal information of video frame sequences. Chen et al proposed a two-stream structure based on a deep three-dimensional CNN [16], while Rong et al used 3D-ResNet to extract video spatiotemporal features [10]. Recent studies have demonstrated that CNN-based algorithms achieve state-of-the-art (SOTA) performance on datasets such as Brain4Car [1].…”

Section: Driver Intention Prediction By Cnnmentioning

confidence: 99%

“…To address this, many research teams have emphasized the importance of cross-modal information interaction and designed effective multi-modal fusion methods. However, these methods typically focus on multimodal fusion in a single dimension (i.e., either feature extractor or classifier) [1,10,[16][17][18], and often overlook the potential benefits of incorporating GPS information. This is a noteworthy limitation in the field that warrants further exploration.…”

Section: Cross-modal Information Interactionmentioning

confidence: 99%

“…The effective interaction of multimodal information during feature extraction is critical for successful cross-modal fusion. However, previous studies have often overlooked this aspect [1,10,[16][17][18]. In this paper, we propose a novel information interaction mechanism that avoids the performance limitations of simply concatenating features or increasing the computational load of the network through excessive use of two-dimensional convolutional layer stacks.…”

Section: Cross-modality Channel-spatial Weight Mechanismmentioning

confidence: 99%

“…Deep learning has recently gained extensive development and application in various fields. Due to the principle of using a large number of neurons to simulate human perception, thinking, and other activities, researchers have employed deep learning to address driver intention prediction, with promising outcomes [1,10,[16][17][18]. Generally, deep learning-based driver intention prediction approaches offer advantages such as automatic feature learning, end-to-end learning, and more comprehensive and superior performance of the learned features.…”

Section: Introductionmentioning

confidence: 99%

“…Generally, deep learning-based driver intention prediction approaches offer advantages such as automatic feature learning, end-to-end learning, and more comprehensive and superior performance of the learned features. However, most studies utilize 3D Conv [10,16,19], optical flow [1], or stacking of LSTM [3,[16][17][18]20] to model the temporal information of video sequences, leading to issues of large network parameters and high algorithm deployment costs. Moreover, it is noteworthy that most studies neglect or do not effectively utilize GPS information.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Driver intention prediction based on multi-dimensional cross-modality information interaction

Xue,

Xu,

Qiao

et al. 2024

Multimedia Systems

View full text Add to dashboard Cite

Driver intention prediction allow drivers to perceive possible dangers in the fastest time and has become one of the most important research topics in the field of self-driving in recent years. In this study, we propose a driver intention prediction method based on multi-dimensional crossmodality information interaction. First, an efficient video recognition network is designed to extract channel-temporal features of in-side (driver) and out-side (road) videos respectively, in which we design a cross-modality channel-spatial weight mechanism to achieve information interaction between the two feature extraction networks corresponding respectively to the two modalities, and we also introduce a contrastive learning module by which we force the two feature extraction networks to enhance structural knowledge interaction. Then, the obtained representations of in-and out-side videos are fused using a Res-Layer based module to get a preliminary prediction which is then corrected by incorporating the GPS information to obtain a final decision. Besides, we employ a multi-task framework to train the entire network. We validate the proposed method on the public dataset Brain4Car, and the results show that the proposed method achieves competitive results in accuracy while balancing performance and computation.

show abstract

Section: Driver Intention Prediction By Cnnmentioning

confidence: 99%

Section: Cross-modal Information Interactionmentioning

confidence: 99%

Section: Cross-modality Channel-spatial Weight Mechanismmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Driver intention prediction based on multi-dimensional cross-modality information interaction

Xue,

Xu,

Qiao

et al. 2024

Multimedia Systems

View full text Add to dashboard Cite

show abstract

Driver Intention Prediction Based on Multi-Dimensional Cross-Modality Information Interaction

Meng

Zheng

et al. 2023

Preprint

View full text Add to dashboard Cite

Driver intention prediction allow drivers to perceive possible dangers in the fastest time and has become one of the most important research topics in the field of self-driving in recent years. In this study, we propose a driver intention prediction method based on multi-dimensional cross-modality information interaction. First, an efficient video recognition network is designed to extract channel-temporal features of in-side (driver) and out-side (road) videos respectively, in which we design a cross-modality channel-spatial weight mechanism to achieve information interaction between the two feature extraction networks corresponding respectively to the two modalities, and we also introduce a contrastive learning module by which we force the two feature extraction networks to enhance structural knowledge interaction. Then, the obtained representations of in- and out-side videos are fused using a Res-Layer based module to get a preliminary prediction which is then corrected by incorporating the GPS information to obtain a final decision. Besides, we employ a multi-task framework to train the entire network. We validate the proposed method on the public dataset Brain4Car, and the results show that the proposed method achieves competitive results in accuracy while balancing performance and computation.

show abstract

STA-Net: A Spatial–Temporal Joint Attention Network for Driver Maneuver Recognition, Based on In-Cabin and Driving Scene Monitoring

He,

Yu,

Wang

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Next-generation advanced driver-assistance systems (ADASs) are a promising direction for intelligent transportation systems. To achieve intelligent security monitoring, it is imperative that vehicles possess the ability to accurately comprehend driver maneuvers amidst diverse driver behaviors and complex driving scenarios. Existing CNN-based and transformer-based driver maneuver recognition methods face challenges in effectively capturing global and local features across temporal and spatial dimensions. This paper proposes a Spatial–Temporal Joint Attention Network (STA-Net) to realize high-efficient temporal and spatial feature extractions in driver maneuver recognition. First, we introduce a two-stream architecture for a concurrent analysis of in-cabin driver behaviors and out-cabin environmental information. Second, we propose a Multi-Scale Transposed Attention (MSTA) module and Multi-Scale Feedforward Network (MSFN) to extract features at multiple scales, addressing receptive field inadequacies and combining high-level and low-level information. Third, to address the information redundancy in multi-scale features, we propose a Cross-Spatial Attention Module (CSAM) and Multi-Scale Cross-Spatial Fusion Module (MCFM) to select essential features. Additionally, we introduce an asymmetric loss function to effectively tackle the issue of sample imbalance across diverse categories of driving maneuvers. The proposed method demonstrates a remarkable accuracy of 90.97% and an F1 score of 89.37% on the Brain4Cars dataset, surpassing the performance of the methods compared. These results substantiate the fact that our approach effectively enhances driver maneuver recognition.

show abstract

Spatiotemporal Feature Enhancement Aids the Driving Intention Inference of Intelligent Vehicles

Cited by 4 publications

References 46 publications

Driver intention prediction based on multi-dimensional cross-modality information interaction

Driver intention prediction based on multi-dimensional cross-modality information interaction

Driver Intention Prediction Based on Multi-Dimensional Cross-Modality Information Interaction

STA-Net: A Spatial–Temporal Joint Attention Network for Driver Maneuver Recognition, Based on In-Cabin and Driving Scene Monitoring

Contact Info

Product

Resources

About