Self-Learning Control of Finite Markov Chains

Poznyak, Alexander S.; Najim, K.; Gómez-Ramı́rez, E.

doi:10.1201/9781482273274

Cited by 64 publications

(16 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Each iteration of formulas presented in Eq. (17) and Eq. (18) has a natural interpretation and involves three nonlinear equations, corresponding to evaluation of the three extraproximal operators.…”

Section: 3multi-period Portfolio Optimization Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Multiperiod Mean-Variance Customer Constrained Portfolio Optimization for Finite Discrete-Time Markov Chains

Domiguez¹,

Clempner²

2019

ECECSR

View full text Add to dashboard Cite

The multi-period formulation aims at selecting an optimal investment strategy in a time-horizon able to maximize the final wealth while minimize the risk and determine the exit time. This paper is dedicated to solve the multi-period mean-variance customer constrained Markowitz's portfolio optimization problem employing the extraproximal method restricted to a finite discrete time, ergodic and controllable Markov chains for finite time horizon. The extraproximal method can be considered as a natural generalization of the convex programming approximation methods that largely simplifies the mathematical analysis and the economic interpretation of such model settings. We show that the multi-period mean-variance optimal portfolio can be decomposed in terms of coupled nonlinear programming problems implementing the Lagrange principle, each having a clear economic interpretation. This decomposition is a multi-period representation of single-period mean variance customer portfolio which naturally extends the basic economic intuition of the static Markowitz's model (where the investment horizon is practically never known at the beginning of initial investment decisions). This implies that the corresponding multi-period mean-variance customer portfolio is determined for a system of equations in proximal format. Each equation in this system is an optimization mean-variance problem which is solved using an iterating projection gradient method. Iterating these steps, we obtain a new quick procedure which leads to a simple and logically justified computational realization: at each iteration of the extraproximal method the functional of the mean-variance portfolio converges to an equilibrium point. We provide conditions for the existence of a unique solution to the portfolio problem by employing a regularized Lagrange function. We present the convergence proof of the method and all the details needed to implement the extraproximal method in an efficient and numerically stable way. Empirical results are finally provided to illustrate the suitability and practical performance of the model and the derived explicit portfolio strategy.

show abstract

Section: 3multi-period Portfolio Optimization Methodsmentioning

confidence: 99%

“…Let be a finite set consisting of states { 1 , … , }, ∈ ℕ, called the state space. A Stationary Markov chain [17,7] is a sequence of -valued random variables ( ), ∈ ℕ, satisfying the Markov condition:…”

Section: Homogeneous Markov Chains Modelmentioning

confidence: 99%

Multiperiod Mean-Variance Customer Constrained Portfolio Optimization for Finite Discrete-Time Markov Chains

Domiguez¹,

Clempner²

2019

ECECSR

View full text Add to dashboard Cite

show abstract

“…The assumption that the Markov chains are ergodic ensures that ij has a unique everywhere positive invariant distribution P n [31] and, for a finite S, it is equivalent to the existence of some N ∈ N such that ( ij ) n > 0…”

Section: Remarkmentioning

confidence: 99%

“…Markov decision processes involve a popular framework for sequential decision-making in a random dynamic environment [31]. At each time step, an agent observes the state of the system of interest and chooses an action.…”

Section: Basicsmentioning

confidence: 99%

Conforming coalitions in Markov Stackelberg security games: Setting max cooperative defenders vs. non-cooperative attackers

Clempner

Poznyak²

2016

Applied Soft Computing

View full text Add to dashboard Cite

“…Available RL algorithms are in no means adequate. Theoretical studies prove convergence in only a few narrow special cases (see [14], [8]). Practical experience indicates that they generally do not achieve the Bellman optimality condition (that is the globally optimal solution, see [1]).…”

Section: Introductionmentioning

confidence: 99%

A Reinforcement Learning Method Based on Adaptive Simulated Annealing

Atiya

Parlos

Ingber³

2003 46th Midwest Symposium on Circuits and Systems

View full text Add to dashboard Cite

Reinforcement learning is a hard problem and the majority of the existing algorithms suffer from poor convergence properties for difficult problems. In this paper we propose a new reinforcement learning method, that utilizes the power of global optimization methods such as simulated annealing. Specifically, we use a particularly powerful version of simulated annealing called Adaptive Simulated Annealing (ASA) [3]. Towards this end we consider a batch formulation for the reinforcement learning problem, unlike the online formulation almost always used. The advantage of the batch formulation is that it allows state-of-the-art optimization procedures to be employed, and thus can lead to

show abstract

Self-Learning Control of Finite Markov Chains

Cited by 64 publications

References 0 publications

Multiperiod Mean-Variance Customer Constrained Portfolio Optimization for Finite Discrete-Time Markov Chains

Multiperiod Mean-Variance Customer Constrained Portfolio Optimization for Finite Discrete-Time Markov Chains

Conforming coalitions in Markov Stackelberg security games: Setting max cooperative defenders vs. non-cooperative attackers

A Reinforcement Learning Method Based on Adaptive Simulated Annealing

Contact Info

Product

Resources

About