Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Cheng, Yan

doi:10.1016/j.eswa.2007.09.050

Cited by 6 publications

(3 citation statements)

References 19 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They considered an infinite horizon learning problem where there is no deadline for the sale of stock. Cheng [28] applied a QL algorithm for RL to solve dynamic pricing problems for selling a given stock with a finite horizon. The study investigated the pricing process and how an RL framework is used to set prices dynamically to adapt to uncertain demand and large-scale states.…”

Section: B Reinforcement Learning For Dynamic Pricingmentioning

confidence: 99%

A Novel Dynamic Pricing Approach for Preemptible Cloud Services

Peng,

Cheng

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

Dynamic pricing for preemptible cloud services (DPPCS) is highly demanded to effectively utilize the excess capacity in cloud computing. However, the dynamic nature of excess capacity exhibits high non-stationarity, which is characterized by multi-temporal stochastic patterns with time-varying statistical properties. The non-stationarity results in the DPPCS problem being a Non-Stationary Markov Decision Process (NSMDP) with unknown transition probabilities. Moreover, DPPCS is constrained by a certain maximum preemption rate, further complicating the DPPCS problem as a Constrained NSMDP (CNSMDP). We transform the CNSMDP into a piecewise Lagrangian dual model, which converts the CNSMDP into an unconstrained optimization problem. To solve the above problem, we propose a novel Q-Learning approach for DPPCS. We first present estimation methods for the unknown environment parameters, including a detection method for identifying temporal pattern changes, and a diffusion approximation method for estimating the actual preemption rate. Then, we introduce a Lagrange multiplier updating method, which can strike a balance between revenue and the preemption rate in the reward function. Building upon the above methods, we develop a Constrained Non-Stationary Q-Learning (CNSQL) algorithm for DPPCS, which dynamically adjusts its learning process to adapt to the multi-temporal patterns. Through simulated experiments, we demonstrate the effectiveness of our proposed approach compared to state-of-the-art algorithms. It performs well in improving revenue generated from excess capacity while maintaining the actual preemption rate within the specified constraint.

show abstract

Section: B Reinforcement Learning For Dynamic Pricingmentioning

confidence: 99%

A Novel Dynamic Pricing Approach for Preemptible Cloud Services

Peng,

Cheng

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…And in some examples of book [7], tabular-value function is used and excellent results were achieved. However, the size of the table may be considerable because of the excessive amount of memory needed to store the table [8]. In order to deal with the continuous state space, some approximation methods are taken.…”

Section: Introductionmentioning

confidence: 99%

Smooth Q-learning: Accelerate Convergence of Q-learning Using Similarity

Liao,

Wei,

Lai

2021

Preprint

View full text Add to dashboard Cite

An improvement of Q-learning is proposed in this paper. It is different from classic Q-learning in that the similarity between different states and actions is considered in the proposed method. During the training, a new updating mechanism is used, in which the Q value of the similar stateaction pairs are updated synchronously. The proposed method can be used in combination with both tabular Q-learning function and deep Q-learning. And the results of numerical examples illustrate that compared to the classic Q-learning, the proposed method has a significantly better performance.

show abstract

“…Reinforcement learning offers the advantage of formulation of a mathematical model based on multiple variables without any pre-definition of non-linear structure of the model, (Jiang andSheng, 2009, Dorca et al, 2013). Applications of reinforcement learning in the context of expert systems include, among others, goal-regulation in manufacturing systems (Shin et al, 2012), real time rescheduling (Palombarini and Martinez, 2012), inventory control in supply chain management (Kwon et al, 2008;Jiang and Sheng, 2009), and real-time dynamic packaging for e-commerce (Cheng, 2009). Our research similarly uses the advantages of using a model-free approach offered by reinforcement learning algorithm but is applied in a different domain i.e, the dynamic pricing of multiple interdependent products.…”

Section: Introductionmentioning

confidence: 99%