Using Randomization to Break the Curse of Dimensionality

Rust, John

doi:10.2307/2171751

Cited by 321 publications

(240 citation statements)

References 29 publications

(22 reference statements)

Supporting

Mentioning

237

Contrasting

Order By: Relevance

“…Similarly, in infinite-horizon problems we compute a solution toĴ = Tˆ aĴ by using the infinite-horizon version of value iteration. It is helpful to note thatˆ a is "self-approximating" in the sense that in order to characterizeĴ it suffices to find a set of function values at the locations y a s satisfyingĴ (y a s ) = (Tˆ aĴ )(y a s ) (see also Rust, 1997). Then the value ofˆ a J (x) at new locations x = y a s can be derived directly from the definition ofˆ a in (3).…”

Section: Approximate Dynamic Programmingmentioning

confidence: 99%

“…The proof of Lemma 1, which is analogous to Rust's proof of a corresponding theorem regarding density-based random operators (Rust, 1997), can be found in Appendix A.2. Note that we restrict ourselves to the interval [b, 1 − b] to avoid boundary effects of the weighting kernel.…”

Section: Lemma 1 For Any Lipschitz Continuous Element J Of C([0 1] mentioning

confidence: 99%

“…Besides guiding the practical choice of b, Theorem 3 provides insight into the question of whether simulation-based methods are suited to "break" the curse of dimensionality. This question is the main focus of Rust's analysis of density-based random approximations of the Bellman operator (Rust, 1997). In particular, Rust concludes that if the transition density p(y, | x, a) is known, an algorithm similar to ours can be used to approximate the value function in a computation time that is only polynomial in d. By contrast, Theorem 3 suggests that for kernel-based reinforcement learning, where p(y | x, a) is unknown, the number of observations grows exponentially in d even if b is chosen optimally.…”

Section: Theorem 3 the Optimal Convergence Rate That May Be Obtainedmentioning

confidence: 99%

“…In the context of reinforcement learning, local averaging has been suggested in work by Rust (1997) and Gordon (1999), making the assumption that the transition probabilities of the MDP are known and can be used for learning. Our approach is fundamentally different in that kernel-based reinforcement learning only relies on the sample trajectories of the MDP.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Untitled

2002

View full text Add to dashboard Cite

Abstract. We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.

show abstract

Section: Approximate Dynamic Programmingmentioning

confidence: 99%

Section: Lemma 1 For Any Lipschitz Continuous Element J Of C([0 1] mentioning

confidence: 99%

Section: Theorem 3 the Optimal Convergence Rate That May Be Obtainedmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Untitled

2002

View full text Add to dashboard Cite

show abstract

“…More details on the return to schooling can be found in Belzil and Hansen (2001). 12 The predicted schooling attainments, along with actual frequencies are found in Table 1, and allow us to evaluate the goodness of¯t. There is clear evidence that our model is capable of¯tting the data well.…”

Section: Structural Estimates and Goodness Of Fitmentioning

confidence: 99%

Earnings Dispersion, Risk Aversion and Education

Belzil

Hansen²

Research in Labor Economics

View full text Add to dashboard Cite

show abstract

Customer Relationship Management: Maximizing Customer Lifetime Value

Lewis¹

2011

Wiley Encyclopedia of Operations Research and Management Science

View full text Add to dashboard Cite

An inherent difficulty in the practice of Customer Relationship Management is that the development of marketing policies that optimize the value of a firm's customer assets requires fairly sophisticated statistical and optimization techniques. This article reviews the existing academic literature focused on methods for optimizing customer lifetime value. The article covers key customer metrics and the use of dynamic programming to maximize customer value and reviews existing applications. The article also discusses limitations of the current body of research and opportunities for developing optimization models that employ rich models of consumer behavior.

show abstract

Using Randomization to Break the Curse of Dimensionality

Cited by 321 publications

References 29 publications

Untitled

Untitled

Earnings Dispersion, Risk Aversion and Education

Customer Relationship Management: Maximizing Customer Lifetime Value

Contact Info

Product

Resources

About