2005
DOI: 10.1002/int.20121
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic pricing based on asymmetric multiagent reinforcement learning

Abstract: A dynamic pricing problem is solved by using asymmetric multiagent reinforcement learning in this article. In the problem, there are two competing brokers that sell identical products to customers and compete on the basis of price. We model this dynamic pricing problem as a Markov game and solve it by using two different learning methods. The first method utilizes modified gradient descent in the parameter space of the value function approximator and the second method uses a direct gradient of the parameterize… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0
1

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(12 citation statements)
references
References 16 publications
0
11
0
1
Order By: Relevance
“…Kephart and Tesauro [21] studied the use of Q-Learning in a scenario where two competitive "pricebots" have to set the price of a commodity. Könönen [23] investigated a similar problem for the scenario in which one agent has the power to enforce its strategy on the other.…”
Section: Related Workmentioning
confidence: 99%
“…Kephart and Tesauro [21] studied the use of Q-Learning in a scenario where two competitive "pricebots" have to set the price of a commodity. Könönen [23] investigated a similar problem for the scenario in which one agent has the power to enforce its strategy on the other.…”
Section: Related Workmentioning
confidence: 99%
“…Thus many advanced algorithms are developed to accurately obtain these equilibriums, such as Correlated-Q (CE-Q) learning [15], which employs the correlated equilibrium solution with four variants so as to achieve empirical convergence to equilibrium policies on a testbed of general-sum Markov games. Asymmetric-Q [16] uses asymmetric MA reinforcement learning (MARL) to obtain a faster convergence rate compared to single-agent reinforcement learning for dynamic pricing problems. The same author of this paper has developed R(k) imitation learning [17] for AGC of interconnected power grids and stochastic optimal relaxed AGC in non-Markov environment based on multi-step Q(k) learning [18].…”
Section: Introductionmentioning
confidence: 99%
“…Then the non-MDP and game theory [23] based multiagent system (MAS) stochastic game (MAS-SG) [24] is developed to handle the complicated dynamic-gaming and decision-making among the heterogeneous MA. Based on the game theory, many advanced algorithms have been employed to obtain system equilibriums, such as correlated-Q (CE-Q) [25], asymmetric-Q [26], and their modifications [24]. Author's previous work on the single-agent reinforcement learning (SARL) and MAS-SG has demonstrated that an optimal AGC can be achieved when the agent number is relatively small [27][28][29][30][31][32][33][34][35].…”
Section: Introductionmentioning
confidence: 99%
“…However, multi-equilibrium may emerge as the agent number increases, which inevitably consumes longer time resulted from the extensive online calculation of all system equilibriums, and may even lead to a severe system stability collapse. However, the aforementioned literatures [23][24][25][26][27][28][29][30][31][32][33][34][35] only calculated an optimal total power reference, which is then dispatched through a fixed proportion to the adjustable capacity. In general, they may not obtain an optimal dispatch due to the static optimization.…”
Section: Introductionmentioning
confidence: 99%