2018
DOI: 10.1007/978-3-030-01267-0_15
|View full text |Cite
|
Sign up to set email alerts
|

On Offline Evaluation of Vision-Based Driving Models

Abstract: Autonomous driving models should ideally be evaluated by deploying them on a fleet of physical vehicles in the real world. Unfortunately, this approach is not practical for the vast majority of researchers. An attractive alternative is to evaluate models offline, on a pre-collected validation dataset with ground truth annotation. In this paper, we investigate the relation between various online and offline metrics for evaluation of autonomous driving models. We find that offline prediction error is not necessa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
66
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
3

Relationship

1
9

Authors

Journals

citations
Cited by 85 publications
(78 citation statements)
references
References 20 publications
4
66
0
Order By: Relevance
“…This proves that online test is the real significant indicator for IL when it is used for active control. Note that this is in line with the findings of [7], which highlights that the correlation between offline metrics and online performance is weak. Table 5: Comparison of MAE on train and validation data (in m), with none, partial and full data augmentation (None, Partial, Full), less is better Table 5 also shows that the error is greater for the neighbors than for the ego.…”
Section: Ablation Studiessupporting
confidence: 88%
“…This proves that online test is the real significant indicator for IL when it is used for active control. Note that this is in line with the findings of [7], which highlights that the correlation between offline metrics and online performance is weak. Table 5: Comparison of MAE on train and validation data (in m), with none, partial and full data augmentation (None, Partial, Full), less is better Table 5 also shows that the error is greater for the neighbors than for the ego.…”
Section: Ablation Studiessupporting
confidence: 88%
“…We validate every 20k iterations and if the validation error increases for three iterations we stop the training process and use this checkpoint to test on the benchmarks, both CARLA and NoCrash. We build a validation dataset as described in [9].…”
Section: Training Detailsmentioning
confidence: 99%
“…While imitation learning based approaches have shown important progress in autonomous driving [ 27 , 28 , 29 , 30 ], they present limitations when deployed in environments beyond the training distribution [ 31 ]. These driving models relying on supervised techniques are often evaluated on performance metrics on pre-collected validation datasets [ 32 ], however low prediction error on offline testing is not necessarily correlated with driving quality [ 33 ]. Even when demonstrating desirable performance during closed-loop testing in naturalistic driving scenarios, imitation learning models often degrade in performance due to distributional shift [ 26 ], unpredictable road users [ 34 ], or causal confusion [ 35 ] when exposed to a variety of driving scenarios.…”
Section: Related Workmentioning
confidence: 99%