2017
DOI: 10.48550/arxiv.1703.02660
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Generalization and Simplicity in Continuous Control

Abstract: This work shows that policies with simple linear and RBF parameterizations can be trained to solve a variety of widely studied continuous control tasks, including the OpenAI gym benchmarks. The performance of these trained policies are competitive with state of the art results, obtained with more elaborate parameterizations such as fully connected neural networks. Furthermore, the standard training and testing scenarios for these tasks are shown to be very limited and prone to overfitting, thus giving rise to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(30 citation statements)
references
References 17 publications
0
28
0
Order By: Relevance
“…u exog (t) can be determined using u r (t), x r (t), and (37) as φ(x) is a known function of x. It is clear from ( 39) -( 41) that if there are no parametric uncertainties, and if the initial conditions of ( 35) are identical to those of (36), with K(0) = I, Θ nl (0) = Θ nl,r and Θ l (0) = Θ l,r , then the AC-RL policy coincides with u r (t). For the rest of this paper, unless otherwise mentioned, we choose Q = 2I.The following theorem presents the stability property of the AC-RL as well as Regret (defined in ( 14)):…”
Section: Ac-rlmentioning
confidence: 99%
See 4 more Smart Citations
“…u exog (t) can be determined using u r (t), x r (t), and (37) as φ(x) is a known function of x. It is clear from ( 39) -( 41) that if there are no parametric uncertainties, and if the initial conditions of ( 35) are identical to those of (36), with K(0) = I, Θ nl (0) = Θ nl,r and Θ l (0) = Θ l,r , then the AC-RL policy coincides with u r (t). For the rest of this paper, unless otherwise mentioned, we choose Q = 2I.The following theorem presents the stability property of the AC-RL as well as Regret (defined in ( 14)):…”
Section: Ac-rlmentioning
confidence: 99%
“…Theorem 2. Under Assumptions A1-A2, A4', and A5, the closed-loop adaptive system specified by (35), (36), (38) and (43) has globally bounded solutions with lim t→∞ e(t) = 0 with R = O(1).…”
Section: Htac-rlmentioning
confidence: 99%
See 3 more Smart Citations