Abstract-One often encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for controlled Markov chains. In this paper, we provide a method to construct sub-optimal policies along with a bound for the deviation of such a policy from the optimum through the use of restricted linear programming. The novelty of this approach lies in circumventing the need for a value iteration or a linear program defined on the entire state-space. Instead, the state-space is partitioned based on the reward structure and the optimal cost-to-go or value function is approximated by a constant over each partition. We associate a meta-state with each partition, where the transition probabilities between these meta-states can be derived from the original Markov chain specification. The state aggregation approach results in a significant reduction in the computational burden and lends itself to a restricted linear program defined on the aggregated state-space. Finally, the proposed method is bench marked on a perimeter surveillance stochastic control problem.
The optimal control of a "blind" UAV searching for a target moving on a road network and heading at a known speed toward a set of goal vertices is considered. To aid the UAV, some roads in the network have been instrumented with Unattended Ground Sensors (UGSs) that detect the target's passage. When the UAV arrives at an instrumented node, the UGS therein informs the UAV if and when the target visited the node. In addition, the UAV can choose to wait/loiter for an arbitrary time at any UGS location/node. At time 0, the target passes by an entry node on his way towards one of the exit nodes. The UAV also arrives at this entry node after some delay and is thus informed about the presence of the target/target in the network, whereupon the chase is on-the UAV is tasked with capturing the target. Because the UAV is blind, capture entails the UAV and target being collocated at an UGS location. If this happens, the UGS is triggered and this information is instantaneously relayed to the UAV, thereby enabling capture. On the other hand, if the target reaches one of the exit nodes without being captured, he is deemed to have escaped. For a given initial delay, we compute the pursuit policy, if it exists, that achieves capture in minimum time, under worst-case target actions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.