A problem is considered where jobs arrive at random times and assume random values, or importance. These must be assigned to a fixed set of men whose qualities are different but known. As each job arrives, its value is observed and the decision-maker must decide which man, if any, to assign to this job. If a job arrives at time t and its value is observed to be x, then by assigning man i with quality p i, a reward r(t)p ix is received, where r(t) is a discount function. The object is to find an assignment policy which maximizes the expected reward from the available men. The problem is analyzed for different arrival distributions and for different discount functions, but in all cases, the optimal policies are shown to have fairly simple forms, independent of the actual qualities of the men, the p i's. Other interpretations of the model, besides the men and jobs interpretation, are also given. The paper concludes with a similar model which does not, however, include time as an explicit parameter.
The problem of choosing the one best or several best of a set of sequentially observed random variables has been treated by many authors. For example, the seller of a house has this problem when deciding which bids on the house to accept and which to reject. We assume that the bids are identically distributed random variables and at most n can be observed. Each bid is accepted or rejected when received; a bid rejected now cannot be accepted later on. The object is to maximize the expected value of the bid actually accepted. Unlike most previous authors, we examine the case where one or more parameters of the common underlying distribution are unknown and information on these is updated in a Bayesian manner as the successive random variables are observed. Using the properties of location and scale parameters, an explicit form for the optimal policy is found when the underlying distribution is normal, uniform, or gamma and the prior is from the natural conjugate family. Simulation results concerning sensitivity of the value obtained to the amount and correctness of the prior information for these three families is then presented.
This paper examines monotonicity results for a fairly general class of partially observable Markov decision processes. When there are only two actual states in the system and when the actions taken are primarily intended to improve the system, rather than to inspect it, we give reasonable conditions which ensure that the optimal reward function and the optimal action are both monotone in the current state of information. Examples of maintenance systems and advertising systems for which our results hold are given. Finally, we examine the case where there are three or more actual states and indicate the difficulties encountered when we attempt to extend the monotonicity results to this situation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.