We consider a firm (e.g., retailer) selling a single nonperishable product over a finite-period planning horizon. Demand in each period is stochastic and price-dependent, and unsatisfied demands are backlogged. At the beginning of each period, the firm determines its selling price and inventory replenishment quantity, but it knows neither the form of demand dependency on selling price nor the distribution of demand uncertainty a priori, hence it has to make pricing and ordering decisions based on historical demand data. We propose a nonparametric data-driven policy that learns about the demand on the fly and, concurrently, applies learned information to determine replenishment and pricing decisions. The policy integrates learning and action in a sense that the firm actively experiments on pricing and inventory levels to collect demand information with the least possible profit loss. Besides convergence of optimal policies, we show that the regret, defined as the average profit loss compared with that of the optimal solution when the firm has complete information about the underlying demand, vanishes at the fastest possible rate as the planning horizon increases.
We consider an inventory control problem with multiple products and stockout substitution. The firm knows neither the primary demand distribution for each product nor the customers’ substitution probabilities between products a priori, and it needs to learn such information from sales data on the fly. One challenge in this problem is that the firm cannot distinguish between primary demand and substitution (overflow) demand from the sales data of any product, and lost sales are not observable. To circumvent these difficulties, we construct learning stages with each stage consisting of a cyclic exploration scheme and a benchmark exploration interval. The benchmark interval allows us to isolate the primary demand information from the sales data, and then this information is used against the sales data from the cyclic exploration intervals to estimate substitution probabilities. Because raising the inventory level helps obtain primary demand information but hinders substitution demand information, inventory decisions have to be carefully balanced to learn them together. We show that our learning algorithm admits a worst-case regret rate that (almost) matches the theoretical lower bound, and numerical experiments demonstrate that the algorithm performs very well. This paper was accepted by J. George Shanthikumar, big data analytics.
We consider a joint pricing and inventory control problem in which the customer’s response to selling price and the demand distribution are not known a priori. Unsatisfied demand is lost and unobserved, and the only available information for decision making is the observed sales data (also known as censored demand). Conventional approaches, such as stochastic approximation, online convex optimization, and continuum-armed bandit algorithms, cannot be employed, because neither the realized values of the profit function nor its derivatives are known. A major challenge of this problem lies in that the estimated profit function constructed from observed sales data is multimodal in price. We develop a nonparametric spline approximation–based learning algorithm. The algorithm separates the planning horizon into a disjoint exploration phase and an exploitation phase. During the exploration phase, a spline approximation of the demand-price function is constructed based on sales data, and then the corresponding surrogate optimization problem is solved on a sparse grid to obtain a pair of recommended price and target inventory level. During the exploitation phase, the algorithm implements the recommended strategies. We establish a (nearly) square-root regret rate, which (almost) matches the theoretical lower bound.
Pricing and inventory replenishment are important operations decisions for firms such as retailers. To make these decisions effectively, a firm needs to know the demand distribution and its dependency on selling price, which is usually estimated using sales data at various testing price levels. Although more testing prices can lead to a better estimation of the demand–price relationship, frequent price changes are costly and come with adverse effect such as customers’ negative perception. In this article, data-driven algorithms are developed that learn the demand structure with constraints on the number of price changes. These algorithms are shown to converge to the optimal clairvoyant solution, and the convergence rates are the best possible in terms of profit loss.
We consider a single product dynamic pricing with demand learning. The candidate prices belong to a wide range of a price interval; the modeling of the demand functions is nonparametric in nature, imposing only smoothness regularity conditions. One important aspect of our model is the possibility of the expected reward function to be nonconcave and indeed multimodal, which leads to many conceptual and technical challenges. Our proposed algorithm is inspired by both the Upper-Confidence-Bound algorithm for multiarmed bandit and the Optimism-in-the-Face-of-Uncertainty principle arising from linear contextual bandits. The multiarmed bandit formulation arises from local-bin approximation of an unknown continuous demand function, and the linear contextual bandit formulation is then applied to obtain more accurate local polynomial approximators within each bin. Through rigorous regret analysis, we demonstrate that our proposed algorithm achieves optimal worst-case regret over a wide range of smooth function classes. More specifically, for k-times smooth functions and T selling periods, the regret of our proposed algorithm is [Formula: see text], which is shown to be optimal via the development of information theoretical lower bounds. We also show that in special cases, such as strongly concave or infinitely smooth reward functions, our algorithm achieves an [Formula: see text] regret, matching optimal regret established in previous works. Finally, we present computational results that verify the effectiveness of our method in numerical simulations. This paper was accepted by J. George Shanthikumar, big data analytics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.