We consider a firm (e.g., retailer) selling a single nonperishable product over a finite-period planning horizon. Demand in each period is stochastic and price-dependent, and unsatisfied demands are backlogged. At the beginning of each period, the firm determines its selling price and inventory replenishment quantity, but it knows neither the form of demand dependency on selling price nor the distribution of demand uncertainty a priori, hence it has to make pricing and ordering decisions based on historical demand data. We propose a nonparametric data-driven policy that learns about the demand on the fly and, concurrently, applies learned information to determine replenishment and pricing decisions. The policy integrates learning and action in a sense that the firm actively experiments on pricing and inventory levels to collect demand information with the least possible profit loss. Besides convergence of optimal policies, we show that the regret, defined as the average profit loss compared with that of the optimal solution when the firm has complete information about the underlying demand, vanishes at the fastest possible rate as the planning horizon increases.
We consider an inventory control problem with multiple products and stockout substitution. The firm knows neither the primary demand distribution for each product nor the customers’ substitution probabilities between products a priori, and it needs to learn such information from sales data on the fly. One challenge in this problem is that the firm cannot distinguish between primary demand and substitution (overflow) demand from the sales data of any product, and lost sales are not observable. To circumvent these difficulties, we construct learning stages with each stage consisting of a cyclic exploration scheme and a benchmark exploration interval. The benchmark interval allows us to isolate the primary demand information from the sales data, and then this information is used against the sales data from the cyclic exploration intervals to estimate substitution probabilities. Because raising the inventory level helps obtain primary demand information but hinders substitution demand information, inventory decisions have to be carefully balanced to learn them together. We show that our learning algorithm admits a worst-case regret rate that (almost) matches the theoretical lower bound, and numerical experiments demonstrate that the algorithm performs very well. This paper was accepted by J. George Shanthikumar, big data analytics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.