GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning

Weng, Xinshuo; Wang, Yongxin; Man, Yunze; Kitani, Kris M.

doi:10.1109/cvpr42600.2020.00653

Cited by 212 publications

(153 citation statements)

References 46 publications

Supporting

Mentioning

148

Contrasting

Order By: Relevance

“…Compared to the 2D&3D tracking methods, it is easy to observe that our MOTA and MOTP results outperform those of GNN3DMOT [ 14 ], whose feature interaction mechanism employs an MLP network to exploit the discriminative features. It shows the effectiveness of our RelationConv operation for feature interaction.…”

Section: Methodsmentioning

confidence: 97%

“…However, the model only learns appearance features for the detected objects, and motion features are not considered. Alternatively, GNN3DMOT [ 14 ] proposes a joint feature extractor to learn discriminating appearance features for the objects from the images and the point clouds, and then employs an LSTM neural network to capture the motion information. Finally, a batch triplet loss is processed for data association.…”

Section: Related Workmentioning

confidence: 99%

“…In order to improve the reliability and safety, recent tracking-by-detection approaches [ 2 , 14 , 15 ] first combine the camera with the Lidar sensor that is capable of offering precise spatial information. Leveraging the sensor fusion technology and redundant information from multiple sensors, the performance of 3D MOT can be significantly boosted.…”

Section: Introductionmentioning

confidence: 99%

“…After that, a pairwise feature similarity between any two objects in the different frames is learned from fused features. For example, GNN3DMOT [ 14 ] captures the 2D/3D appearance features and 2D/3D motion features from the image and the point cloud data. However, the model just projects the 3D detections in the point clouds to the images to obtain the 2D information, which leads to lower precision and recall of the performance.…”

Section: Introductionmentioning

confidence: 99%

“…Furthermore, considering the fact that the nodes in the graph are not structured and unordered, we are unlikely to leverage the advantages of convolutional neural networks (CNNs) to exploit the features. Prior works for both 2D MOT and 3D MOT [ 2 , 3 , 7 , 8 , 14 ] use a multi-layer perceptron (MLP) to capture contextual features for each node. However, it is not efficient to learn local features between nodes using a MLP, as the MLP operation is not a spatial convolution.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation

Chen

Fragonara

Tsourdos

2021

Sensors

View full text Add to dashboard Cite

Autonomous systems need to localize and track surrounding objects in 3D space for safe motion planning. As a result, 3D multi-object tracking (MOT) plays a vital role in autonomous navigation. Most MOT methods use a tracking-by-detection pipeline, which includes both the object detection and data association tasks. However, many approaches detect objects in 2D RGB sequences for tracking, which lacks reliability when localizing objects in 3D space. Furthermore, it is still challenging to learn discriminative features for temporally consistent detection in different frames, and the affinity matrix is typically learned from independent object features without considering the feature interaction between detected objects in the different frames. To settle these problems, we first employ a joint feature extractor to fuse the appearance feature and the motion feature captured from 2D RGB images and 3D point clouds, and then we propose a novel convolutional operation, named RelationConv, to better exploit the correlation between each pair of objects in the adjacent frames and learn a deep affinity matrix for further data association. We finally provide extensive evaluation to reveal that our proposed model achieves state-of-the-art performance on the KITTI tracking benchmark.

show abstract

Section: Methodsmentioning

confidence: 97%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation

Chen

Fragonara

Tsourdos

2021

Sensors

View full text Add to dashboard Cite

show abstract

Enhancing the association in multi‐object tracking via neighbor graph

Liang

Lan

Zhang

et al. 2021

Int J of Intelligent Sys

View full text Add to dashboard Cite

Most modern multi‐object tracking (MOT) systems for videos follow the tracking‐by‐detection paradigm, where objects of interest are first located in each frame then associated correspondingly to form their intact trajectories. In this setting, the appearance features of objects usually provide the most important cues for data association, but it is very susceptible to occlusions, illumination variations, and inaccurate detections, thus easily resulting in incorrect trajectories. To address this issue, in this study we propose to make full use of the neighboring information. Our motivations derive from the observations that people tend to move in a group. As such, when an individual target's appearance is remarkably changed, the observer can still identify it with its neighbor context. To model the contextual information from neighbors, we first utilize the spatiotemporal relations among trajectories to efficiently select suitable neighbors for targets. Subsequently, we construct neighbor graph for each target and corresponding neighbors then employ the graph convolutional networks (GCNs) to model their relations and learn the graph features. To the best of our knowledge, it is the first time to explicitly leverage neighbor cues via GCN in MOT. Finally, standardized evaluations on the MOT16 and MOT17 data sets demonstrate that our approach can remarkably reduce the identity switches whilst achieve state‐of‐the‐art overall performance.

show abstract

Modeling Cross-Modal Interaction in a Multi-detector, Multi-modal Tracking Framework

Zhong

You²,

Neumann

2021

Computer Vision – ACCV 2020

View full text Add to dashboard Cite

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning

Cited by 212 publications

References 46 publications

Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation

Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation

Enhancing the association in multi‐object tracking via neighbor graph

Modeling Cross-Modal Interaction in a Multi-detector, Multi-modal Tracking Framework

Contact Info

Product

Resources

About