2018
DOI: 10.1007/s10589-018-9990-5
|View full text |Cite
|
Sign up to set email alerts
|

Proximal algorithms and temporal difference methods for solving fixed point problems

Abstract: Proximal algorithms and temporal difference methods for solving fixed point problemsThe MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 56 publications
0
6
0
Order By: Relevance
“…The choice of λ embodies the important bias-variance tradeoff: larger values of λ lead to better approximation of J µ , but require a larger number of simulation samples because of increased simulation noise (see the discussion in Section 6.3.6 of [Ber12]). An important insight is that the operator T (λ) µ is closely related to the proximal operator of convex analysis (with λ corresponding to the penalty parameter of the proximal operator), as shown in the author's paper [Ber16a] (see also the monograph [Ber18a], Section 1.2.5, and the paper [Ber18b]). In particular, TD(λ) can be viewed as a stochastic simulation-based version of the proximal algorithm.…”
Section: Indirect Methods Based On Projected Equationsmentioning
confidence: 99%
See 1 more Smart Citation
“…The choice of λ embodies the important bias-variance tradeoff: larger values of λ lead to better approximation of J µ , but require a larger number of simulation samples because of increased simulation noise (see the discussion in Section 6.3.6 of [Ber12]). An important insight is that the operator T (λ) µ is closely related to the proximal operator of convex analysis (with λ corresponding to the penalty parameter of the proximal operator), as shown in the author's paper [Ber16a] (see also the monograph [Ber18a], Section 1.2.5, and the paper [Ber18b]). In particular, TD(λ) can be viewed as a stochastic simulation-based version of the proximal algorithm.…”
Section: Indirect Methods Based On Projected Equationsmentioning
confidence: 99%
“…is a projected equation, which is related to the proximal algorithm [Ber16a], [Ber18b], and may be solved by using temporal differences. Thus we may use exploration-enhanced versions of the LSTD(λ) and LSPE(λ) methods in an approximate PI scheme to solve the λ-aggregation equation.…”
Section: λ-Aggregationmentioning
confidence: 99%
“…Except the cases in which U (x) is not singleton for finite number of x, which we refer to as trivial cases, M being finite implies state space being finite. Therefore, except the trivial cases, with the following finite policy assumption, the λ operator T (λ) µ is ensured to be well-posed (see (Bertsekas, 2018b, Proposition 2.1)), and the monotonicity of the underlying operator H is not required for the desired behavior.…”
Section: λ-Pirmentioning
confidence: 99%
“…A survey can be found in Bertsekas (2012). Most recently, the connection between TD(λ) and proximal algorithms, which are widely used for solving convex optimization problems, is discussed in Bertsekas (2018b). In light of such relation, λ-PI with randomization (λ-PIR) was proposed in (Bertsekas, 2018a, Chaper 2).…”
Section: Introductionmentioning
confidence: 99%
“…1 All authors are with the Department of Information Technology and Electrical Engineering, ETH Zurich, Switzerland, {atanzana,jlygeros}@ethz.ch presented in [11]. A class of PI algorithms based on temporal difference learning and the λ-operator is proposed in [12], which has been further extended using abstract dynamic programming [13] and randomized proximal methods [14], [15]. An alternative family of model-based tabular PI algorithms with multi-step greedy policy improvement is derived in [16], [17].…”
Section: Introductionmentioning
confidence: 99%