Stochastic Online Shortest Path Routing: The Value of Feedback

Talebi, Majid; Zou, Zhenhua; Combes, Richard; Proutière, Alexandre; Johansson, Mikael

doi:10.1109/tac.2017.2747409

Cited by 48 publications

(51 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This method uses one path for exploitation and possibly another path for exploration. Talebi et al [17] proposed to model the path planning process as a Markov decision process (MDP), which coincides with our model in this paper, and proposed the KL-Hop-by-Hop Routing (KL-HHR) algorithm. However, their method has to combine with line search and Bellman-Ford algorithm to choose the next node for a path.…”

Section: A Related Workmentioning

confidence: 62%

“…In this section, we compare the performance of proposed algorithms with KL-Hop-by-Hop routing (KL-HHR) algorithm [17], combinatorial upper confidence bound (CUCB) algorithm [33], and Thompson sampling (TS) algorithm [34] applied for this problem. The details of KL-HHR algorithm can be found in [17]. The CUCB algorithm chooses a whole path for the nth packet according to the following strategy [33]…”

Section: Experimental Results and Analysismentioning

confidence: 99%

“…While in abundant practical scenarios, the hop length or network graph edge length itself is not a constant but acts as a random variable, i.e., dynamic. This random variable can be influenced by the location of nodes, the randomness of demand [17], and more commonly, the network load. The dynamics of network connections essentially form a stochastic graph, i.e., a graph with dynamic edges.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Reinforcement Learning Based Stochastic Shortest Path Finding in Wireless Sensor Networks

Xia

Guo

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Many factors influence the connection states between nodes of wireless sensor networks, such as physical distance, and the network load, making the network's edge length dynamic in abundant scenarios. This dynamic property makes the network essentially form a graph with stochastic edge lengths. In this paper, we study the stochastic shortest path problem on a directional graph with stochastic edge lengths, using reinforcement learning algorithms. we regard each edge length as a random variable following unknown probability distribution and aim to find the stochastic shortest path on this stochastic graph. We evaluate the performance of path-finding algorithms using regret, which represents the cumulative reward difference between the practical path-finding algorithm and the optimal strategy that chooses the global stochastic shortest path every time. We model the path-finding procedure as a Markov decision process and propose two online path-finding algorithms: Q SSP algorithm and SARSA SSP algorithm, both combined with specifically-devised average reward mechanism. We justify the convergence property and correctness of the proposed algorithms theoretically. Experiments conducted on two benchmark graphs illustrate the superior performance of the proposed Q SSP algorithm which outperforms the SARSA SSP algorithm and other competitive algorithms about the regret metric.

show abstract

Section: A Related Workmentioning

confidence: 62%

Section: Experimental Results and Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning Based Stochastic Shortest Path Finding in Wireless Sensor Networks

Xia

Guo

et al. 2019

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Even more broadly, the multi-agent version of classical bandit applications is a natural extension [38]. For example, in online shortest path routing problem [34,39], as another classic example of bandit applications, multi-agent setting could capture the case in which the underlying network is large and each agent is responsible for routing within a sub-graph in the network. Last, it is also plausible that the to have asynchronous learning among different agents in the sense that each agent has its own action rate for decision making.…”

Section: Motivating Applicationmentioning

confidence: 99%

Distributed Bandits with Heterogeneous Agents

Yang¹,

Chen²,

Hajiesmaili³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper tackles a multi-agent bandit setting where M agents cooperate together to solve the same instance of a K-armed stochastic bandit problem. The agents are heterogeneous: each agent has limited access to a local subset of arms and the agents are asynchronous with different gaps between decision-making rounds. The goal for each agent is to find its optimal local arm, and agents can cooperate by sharing their observations with others. While cooperation between agents improves the performance of learning, it comes with an additional complexity of communication between agents. For this heterogeneous multi-agent setting, we propose two learning algorithms, CO-UCB and CO-AAE. We prove that both algorithms achieve order-optimal regret, which is O i: ∆i>0 log T / ∆i , where ∆i is the minimum suboptimality gap between the reward mean of arm i and any local optimal arm. In addition, a careful selection of the valuable information for cooperation, CO-AAE achieves a low communication complexity of O(log T ). Last, numerical experiments verify the efficiency of both algorithms.

show abstract

“…From the past several years, many researches have been done in order to reduce the negative impact of latency in the transmission and communication activities of the fog-IoT model. Talebi et al [1] studied about how to find the shortest path for routing in multi-hop network online. Wang et al [2] have proposed such an optimal method which finds the optimal service placement of the micro clouds thereby reducing the average cost over time.…”

Section: Related Workmentioning

confidence: 99%

Latency Evaluation in an IoT-Fog Model

Siddiqui

Nayak

Faisal

2021

Intelligent Sustainable Systems

View full text Add to dashboard Cite

As the IoT technology is stretching its dominance over almost everything that exists electronically in the real world, it will be almost impossible to get it managed through a traditional cloud-IoT model which is highly latent in nature and prone to failure. Because of these serious drawbacks, fog technology has been introduced and is deployed between the IoT and cloud layer. Fog computing offers computation at the edge only along with reliability and low latency. However, fog computing also faces various challenges and still on its way to get efficient and more performable. This paper presents an exhaustive literature survey of various fog-based models proposed with effective performance and low-latency factor and various issues over the integration of fog with IoT. This review have laid down the various characteristics and challenges of cloud, fog and IoT computing and mainly focuses upon the role of latency in the fog-IoT ecosystem and its negative impact since fog computing have to deal with energy and memory constrained end devices due to which an effective amount of latency in communication is very complex to be reduced. A summary of the past works done along with the identified challenges by the past researcher is also facilitated.

show abstract

Stochastic Online Shortest Path Routing: The Value of Feedback

Cited by 48 publications

References 22 publications

Reinforcement Learning Based Stochastic Shortest Path Finding in Wireless Sensor Networks

Reinforcement Learning Based Stochastic Shortest Path Finding in Wireless Sensor Networks

Distributed Bandits with Heterogeneous Agents

Latency Evaluation in an IoT-Fog Model

Contact Info

Product

Resources

About