Reinforcement Learning Based Stochastic Shortest Path Finding in Wireless Sensor Networks

Xia, Wenwen; Di, Chong; Guo, Haonan; Li, Shenghong

doi:10.1109/access.2019.2950055

Cited by 31 publications

(17 citation statements)

References 35 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If the current policy is used to select actions for such an update, the procedure is called "on-policy learning". For instance, in SARSA method [40], [41], which is an on-policy learning method, the state-action value function is updated as follows…”

Section: B Off-policy Td Learningmentioning

confidence: 99%

“…In most cases, a stochastic (e.g., random) policy is selected as the behaviour policy to ensure enough exploration of new states. One of the most practiced off-policy methods is known as Q-learning [41], [42], [44], [46], which updates the value function using the Bellman optimality equation as follows…”

Section: B Off-policy Td Learningmentioning

confidence: 99%

“…To make the system more stable, only one of the updates shown in Eqs. (41) and (42) will be performed as discussed in [26]. To be more precise, when the size of the covariance is decreasing (i.e., S 1 2 Q π * > 0), the covariances of the RBFs are updated using Eq.…”

Section: Basis Function Updatementioning

confidence: 99%

“…(42), otherwise, their means are updated using Eqs. (41). Using this approach, unlimited expansion of the RBF covariances is avoided.…”

Section: Basis Function Updatementioning

confidence: 99%

See 3 more Smart Citations

MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement Learning

et al. 2020

View full text Add to dashboard Cite

Background: There has been an increasing surge of interest on development of advanced Reinforcement Learning (RL) systems as intelligent approaches to learn optimal control policies directly from smart agents' interactions with the environment. Objectives: In a model-free RL method with continuous state-space, typically, the value function of the states needs to be approximated. In this regard, Deep Neural Networks (DNNs) provide an attractive modeling mechanism to approximate the value function using sample transitions. DNN-based solutions, however, suffer from high sensitivity to parameter selection, are prone to overfitting, and are not very sample efficient. A Kalman-based methodology, on the other hand, could be used as an efficient alternative. Such an approach, however, commonly requires a-priori information about the system (such as noise statistics) to perform efficiently. The main objective of this paper is to address this issue. Methods: As a remedy to the aforementioned problems, this paper proposes an innovative Multiple Model Kalman Temporal Difference (MM-KTD) framework, which adapts the parameters of the filter using the observed states and rewards. Moreover, an active learning method is proposed to enhance the sampling efficiency of the system. More specifically, the estimated uncertainty of the value functions are exploited to form the behaviour policy leading to more visits to less certain values, therefore, improving the overall learning sample efficiency. As a result, the proposed MM-KTD framework can learn the optimal policy with significantly reduced number of samples as compared to its DNN-based counterparts. Results: To evaluate performance of the proposed MM-KTD framework, we have performed a comprehensive set of experiments based on three RL benchmarks, namely, Inverted Pendulum; Mountain Car, and; Lunar Lander. Experimental results show superiority of the proposed MM-KTD framework in comparison to its state-of-the-art counterparts.

show abstract

Section: B Off-policy Td Learningmentioning

confidence: 99%

Section: B Off-policy Td Learningmentioning

confidence: 99%

Section: Basis Function Updatementioning

confidence: 99%

“…(42), otherwise, their means are updated using Eqs. (41). Using this approach, unlimited expansion of the RBF covariances is avoided.…”

Section: Basis Function Updatementioning

confidence: 99%

See 2 more Smart Citations

MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement Learning

et al. 2020

View full text Add to dashboard Cite

show abstract

“…There are different ways to address this problem, including using genetic algorithm [4], ant colony optimisation [5], [6], using reinforcement learning [7], using nearest neighbour optimisation [8], [9], etc. However, unlike classical shortest path problems, we treat this as a stochastic shortest path problem [10] since the AUV must consider the energy cost of a path in addition to the path length. This is because a shorter path may be more expensive than a longer path if the AUV needs to make many turns for the shorter path.…”

Section: Introductionmentioning

confidence: 99%

Energy Optimisation through Path Selection for Underwater Wireless Sensor Networks

Omeke

Mollel

Zhang

et al. 2020

2020 International Conference on UK-China Emerging Technologies (UCET)

View full text Add to dashboard Cite

This paper explores energy-efficient ways of retrieving data from underwater sensor fields using autonomous underwater vehicles (AUVs). Since AUVs are battery-powered and therefore energy-constrained, their energy consumption is a critical consideration in designing underwater wireless sensor networks. The energy consumed by an AUV depends on the hydrodynamic design, speed, on-board payload and its trajectory. In this paper, we optimise the trajectory taken by the AUV deployed from a floating ship to collect data from every cluster head in an underwater sensor network and return to the ship to offload the data. The trajectory optimisation algorithm models the trajectory selection as a stochastic shortest path problem and uses reinforcement learning to select the minimum cost path, taking into account that banked turns consume more energy than straight movement. We also investigate the impact of AUV speed on its energy consumption. The results show that our algorithm improves AUV energy consumption by up to 50% compared with the Nearest Neighbour algorithm for sparse deployments.

show abstract

An Intelligent Routing for Internet of Things Mesh Networks

Chakraborty

Das

Pradhan

2022

Trans Emerging Tel Tech

View full text Add to dashboard Cite

Internet of Things (IoT) is gaining popularity due to its complex network architecture, formed by the tremendous connection of objects. Sensors used in different IoT applications are installed in unfavorable terrains and conditions. Since each sensor node can sense, compute, and promote wireless communication, a novel intelligent routing algorithm is required, as the traditional ones do not fulfill the current network requirements. Reinforcement learning models can help overcome the wireless network's challenges faced during routing due to its dynamicity by selecting and adapting weights that optimize the paths based on the requirement of the applications and operating conditions. In this article, a routing agent with Q‐learning is proposed that adjusts the routing policy of a network based on local information to converge toward an optimal solution by maintaining the overall balance between latency and the network's lifetime. A reward is given to an agent that increases the network lifetime and reduces the average network latency. The evaluation of the proposed model was done using network simulators (NS‐2) on different network scenarios that showed improved results in terms of network lifetime compared to centralized minimum angle and distributed minimum angle.

show abstract

Reinforcement Learning Based Stochastic Shortest Path Finding in Wireless Sensor Networks

Cited by 31 publications

References 35 publications

MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement Learning

MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement Learning

Energy Optimisation through Path Selection for Underwater Wireless Sensor Networks

An Intelligent Routing for Internet of Things Mesh Networks

Contact Info

Product

Resources

About