We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vector Z ∈ R r , where r ≥ 2. The objective is to minimize the cumulative regret and Bayes risk. When the set of arms corresponds to the unit sphere, we prove that the regret and Bayes risk is of order Θ(r √ T ), by establishing a lower bound for an arbitrary policy, and showing that a matching upper bound is obtained through a policy that alternates between exploration and exploitation phases. The phasebased policy is also shown to be effective if the set of arms satisfies a strong convexity condition.For the case of a general set of arms, we describe a near-optimal policy whose regret and Bayes risk admit upper bounds of the form O(r √ T log 3/2 T ).
The paper considers a stylized model of a dynamic assortment optimization problem, where given a limited capacity constraint, we must decide the assortment of products to offer to customers to maximize the profit. Our model is motivated by the problem faced by retailers of stocking products on a shelf with limited capacities and by the problem of placing a limited number of ads on a web page. We assume that each customer chooses to purchase the product (or to click on the ad) that maximizes her utility. We use the multinomial logit choice model to represent demand. However, we do not know the demand for each product. We can learn the demand distribution by offering different product assortments, observing resulting selections, and inferring the demand distribution from past selections and assortment decisions. We present an adaptive policy for joint parameter estimation and assortment optimization. To evaluate our proposed policy, we define a benchmark profit as the maximum expected profit that we can earn if we know the underlying demand distribution in advance. We show that the running average expected profit generated by our policy converges to the benchmark profit and establish its convergence rate. Numerical experiments based on sales data from an online retailer indicate that our policy performs well, generating over 90% of the optimal profit after less than two days of sales.
We consider a stylized dynamic pricing model in which a monopolist prices a product to a sequence of T customers, who independently make purchasing decisions based on the price offered according to a general parametric choice model. The parameters of the model are unknown to the seller, whose objective is to determine a pricing policy that minimizes the regret, which is the expected difference between the seller's revenue and the revenue of a clairvoyant seller who knows the values of the parameters in advance, and always offers the revenue-maximizing price. We show that the regret of the optimal pricing policy in this model is Θ(√ T), by establishing an Ω(√ T) lower bound on the worst-case regret under an arbitrary policy, and presenting a pricing policy based on maximum likelihood estimation whose regret is O(√ T) across all problem instances. Furthermore, we show that when the demand curves satisfy a "well-separated" condition, the T-period regret of the optimal policy is Θ(log T). Numerical experiments show that our policies perform well.
W e consider assortment optimization problems under the multinomial logit model, where the parameters of the choice model are random. The randomness in the choice model parameters is motivated by the fact that there are multiple customer segments, each with different preferences for the products, and the segment of each customer is unknown to the firm when the customer makes a purchase. This choice model is also called the mixture-of-logits model. The goal of the firm is to choose an assortment of products to offer that maximizes the expected revenue per customer, across all customer segments. We establish that the problem is NP complete even when there are just two customer segments. Motivated by this complexity result, we focus on assortments consisting of products with the highest revenues, which we refer to as revenue-ordered assortments. We identify specially structured cases of the problem where revenueordered assortments are optimal. When the randomness in the choice model parameters does not follow a special structure, we derive tight approximation guarantees for revenue-ordered assortments. We extend our model to the multi-period capacity allocation problem, and prove that, when restricted to the revenue-ordered assortments, the mixture-of-logits model possesses the nesting-by-fare-order property. This result implies that revenue-ordered assortments can be incorporated into existing revenue management systems through nested protection levels. Numerical experiments show that revenue-ordered assortments perform remarkably well, generally yielding profits that are within a fraction of a percent of the optimal.
We study stochastic inventory planning with lost sales and instantaneous replenishment, where contrary to the classical inventory theory, the knowledge of the demand distribution is not available. Furthermore, we observe only the sales quantity in each period, and lost sales are unobservable, that is, demand data are censored. The manager must make an ordering decision in each period based only on historical sales data. Excess inventory is either perishable or carried over to the next period. In this setting, we propose non-parametric adaptive policies that generate ordering decisions over time. We show that the T -period average expected cost of our policy differs from the benchmark newsvendor cost -the minimum expected cost that would have incurred if the manager had known the underlying demand distribution -by at most O(1/ √ T ). IntroductionThe problem of inventory control and planning has received much interest from practitioners and academics from the early years of operations research. The early literature in this area modeled demand as deterministic and having known quantities, but it soon became apparent that deterministic modeling was often inadequate, and uncertainty needed to be incorporated in modeling future demand. As a result, a majority of the papers on inventory theory during the past fifty years employ stochastic demand models. In these models, future demand is given by a specific exogenous random variable, and the inventory decisions are made with full knowledge of the future demand distribution. In many applications, however, the demand distribution is not known a priori. Even when past data have been collected, the selection of the most appropriate distribution and its parameters remains ambiguous. In the case when excess demand is lost, the information available to the inventory manager is further limited since she does not observe the realized demand but only observes the sales quantity (often referred to as censored demand), which is the smaller of the stocking level and the realized demand. Motivated by these realistic constraints, we develop a non-parametric approach to stochastic inventory planning in the presence of lost sales and censored demand.
Using the well-known product-limit form of the Kaplan-Meier estimator from statistics, we propose a new class of nonparametric adaptive data-driven policies for stochastic inventory control problems. We focus on the distribution-free newsvendor model with censored demands. The assumption is that the demand distribution is not known and there are only sales data available. We study the theoretical performance of the new policies and show that for discrete demand distributions they converge almost surely to the set of optimal solutions. Computational experiments suggest that the new policies converge for general demand distributions, not necessarily discrete, and demonstrate that they are significantly more robust than previously known policies. As a by-product of the theoretical analysis, we obtain new results on the asymptotic consistency of the Kaplan-Meier estimator for discrete random variables that extend existing work in statistics. To the best of our knowledge, this is the first application of the Kaplan-Meier estimator within an adaptive optimization algorithm, in particular, the first application to stochastic inventory control models. We believe that this work will lead to additional applications in other domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.