Abstract:The problem of solving large Markov decision processes accurately and quickly is challenging. Since the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. On one hand, algorithms such as topological sorting are able to find good orderings but their overhead is usually high. On the other hand, shortest path methods, such as Dijkstra's algorithm which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes. Here, we propose an improved value iteration algorithm based on Dijkstra's algorithm for solving shortest path Markov decision processes. The experimental results on a stochastic shortest-path problem show the feasibility of our approach.
Los procesos de Planificación y de Scheduling en Inteligencia Artificial han estado tradicionalmente relacionados de una manera muy rígida. El primero selecciona a las acciones requeridas para obtener las metas establecidas y el segundo estudia a los requerimientos de ejecución (tiempo y recursos) de esas acciones. Sin embargo, los problemas del mundo real requieren de las capacidades de ambos procesos. En el Estado del Arte se encontraron dos formas de abordar esos problemas: i) enfoque de planificación extendida; ii) enfoque de scheduling extendido. Debido a que ellos presentaron grandes desventajas, fue necesario proveer un modelo que intercale a ambos procesos en una forma flexible (alternando las capacidades de ambos procesos) y general (aplicando a cualquier dominio y para cualquier problema). Este artículo presenta un modelo integrado propuesto, donde se enfatiza en los puntos claves de este enfoque: su estructura y cómo interactúan ambos procesos.
In this paper, we present a new approach for the estimation of Markov decision processes based on efficient association rule mining techniques such as Apriori. For the fastest solution of the resulting association‐rule based Markov decision process, several accelerating procedures such as asynchronous updates and prioritization using a static ordering have been applied. A new criterion for state reordering in decreasing order of maximum reward is also compared with a modified topological reordering algorithm. Experimental results obtained on a finite state and action‐space stochastic shortest path problem demonstrate the feasibility of the new approach.
En este artículo se presenta un nuevo método de aceleración para resolver a los procesos de decisión de Markov. El clásico algoritmo de iteración de valor ha resuelto satisfactoriamente a estos procesos estocásticos, pero este algoritmo y sus variantes aceleradas han sido lentos con factores de descuento cercanos a la unidad y sus propiedades de convergencia han dependido, en gran medida, de un buen ordenamiento en la actualización de estados. Recientemente se mostró que la iteración de valor presenta buena velocidad de convergencia gracias al uso de un algoritmo de ordenamiento topológico mejorado. Sin embargo, la desventaja de este algoritmo es debida a sus requerimientos de memoria. Aquí se presenta un método diferente para obtener un buen ordenamiento de estados actualizados con menor requerimiento de memoria. De igual manera se presentan los resultados experimentales obtenidos sobre un problema de ruta estocástica más corta
In this paper we propose the combination of accelerated variants of value iteration mixed with improved prioritizedsweeping for the fast solution of stochastic shortest-path Markov decision processes. Value iteration is a classicalalgorithm for solving Markov decision processes, but this algorithm and its variants are quite slow for solvingconsiderably large problems. In order to improve the solution time, acceleration techniques such as asynchronousupdates, prioritization and prioritized sweeping have been explored in this paper. A topological reordering algorithmwas also compared with static reordering. Experimental results obtained on finite state and action-space stochasticshortest-path problems show that our approach achieves a considerable reduction in the solution time with respect tothe tested variants of value iteration. For instance, the experiments showed in one test a reduction of 5.7 times withrespect to value iteration with asynchronous updates.
We study experimentally and numerically the transient behavior of a (2+1)D beam when it is totally reflected by nonlinear interface formed by SBN61:Ce photorefractive crystal. The dynamics give rise to observation of new beams. Due to modulation instability of the beam, the nonlinear interface stimulates the break of the beam into new beams that are reflected to different angles.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.