2020
DOI: 10.1007/s10458-020-09480-9
|View full text |Cite
|
Sign up to set email alerts
|

Efficient policy detecting and reusing for non-stationarity in Markov games

Abstract: This paper presents a Simple and effIcient Motion Prediction baseLine (SIMPL) for autonomous vehicles. Unlike conventional agent-centric methods with high accuracy but repetitive computations and scene-centric methods with compromised accuracy and generalizability, SIMPL delivers realtime, accurate motion predictions for all relevant traffic participants. To achieve improvements in both accuracy and inference speed, we propose a compact and efficient global feature fusion module that performs directed message … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 38 publications
0
5
0
Order By: Relevance
“…As π 1 i and π 2 i are jointly trained to maximize performance, the optimal response to π 2 i is already contained within π 1 i . Thus, inspired by recent success in distillation (Zheng et al 2021), we treat π 1 i as an expert and let π M distill its policy. During sampling, we collect across-episodic trajectories, denoted as h t = (o 0 , a 0 , r 0 , ..., o t , a t , r t ).…”
Section: Approachmentioning
confidence: 99%
“…As π 1 i and π 2 i are jointly trained to maximize performance, the optimal response to π 2 i is already contained within π 1 i . Thus, inspired by recent success in distillation (Zheng et al 2021), we treat π 1 i as an expert and let π M distill its policy. During sampling, we collect across-episodic trajectories, denoted as h t = (o 0 , a 0 , r 0 , ..., o t , a t , r t ).…”
Section: Approachmentioning
confidence: 99%
“…Instead of applying BPR to detect opponents' policies, another approach named Bayes-Theory of Mind on Policy (Bayes-ToMoP) is proposed in [152] to efficiently detect the strategy of opponents using either stationary or higher-level reasoning strategies. By combining (BPR+) with the distill policy network (DPN), the DPN-BPR+ is proposed in [153] to study studies efficient policy detecting and reusing techniques in nonstationary MGs. Nonstationarity also prevents the straightforward use of experience replay, which is crucial for stabilizing RL.…”
Section: B Microagent Behavioral Interventionmentioning
confidence: 99%
“…A set of assumptions that can be made is that all players have fixed strategy sets. Under these assumptions, agents could maintain more sophisticated beliefs about their opponent (Zheng et al, 2018) and extend this to recursive-reasoning procedures (Yang et al, 2019). These lines of work focus more on other-player policy identification and are a promising future direction for improving the quality of the OPC.…”
Section: Related Work Multiagent Learningmentioning
confidence: 99%