RL-based superframe order adaptation algorithm for IEEE 802.15.4 networks

Mao, Jianlin; Xiang, Fenghong; Lai, Hua

doi:10.1109/ccdc.2009.5194820

Cited by 1 publication

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(iii) In comparison to the RL based approaches [18,22,23] for transmission scheduling at the MAC layer, we would like to point out that the algorithms proposed there (a) employ full state representations; (b) consider discrete state-action spaces (except [23] which adapts Qlearning for continuous actions, albeit with a discrete state space); (c) consider an MDP with perfect information, i.e., a setting where the states are fully observable; (d) consider only a discounted setting, which is not amenable for studying steady state system behaviour; (e) are primarily concerned with managing transmission in an energy-efficient manner and not with tracking an intruder with highaccuracy. In other words, the algorithms of [18,22,23] are not applicable in our setting as we consider a partially observable MDP with continuous state-action spaces, and 1 A short version of this paper containing only the average cost setting and algorithms and with no proofs is available in [24]. The current paper includes in addition: (i) algorithms for the discounted cost setting; (ii) a detailed proof of convergence of the average cost algorithm using theory of stochastic recursive inclusions; and (iii) detailed numerical experiments.…”

Section: Related Workmentioning

confidence: 98%

“…(iv) Many RL based approaches proposed earlier for sleep scheduling (see [18,22,23,29]) employ full state representations and hence, they are not scalable to larger networks owing to the curse of dimensionality. We employ efficient linear approximators to alleviate this.…”

Section: Related Workmentioning

confidence: 99%

“…The algorithms proposed there attempt to maximize the throughput while being energy efficient. In [22,23], the authors propose Q-learning based algorithms, whereas, in [18], the authors propose an algorithm based on SARSA. In [16], the authors present two sleep scheduling algorithms for single object tracking.…”

Section: Related Workmentioning

confidence: 99%

“…In (18), the action a n is chosen in state s n according to an -greedy policy, i.e., with a probability of (1 À ), a greedy action given by a n ¼ argmin v2Aðs n Þ h T n r s n ;v is chosen and with probability , an action in Aðs n Þ is randomly chosen. Using -greedy policy for the regular Q-learning algorithm has been well recognized and recommended in the literature (cf.…”

Section: Feature Selectionmentioning

confidence: 99%

“…In [18,22,23], the authors propose RL based medium access control (MAC) protocols for WSNs. The algorithms proposed there attempt to maximize the throughput while being energy efficient.…”

Section: Related Workmentioning

confidence: 99%

See 4 more Smart Citations

Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks

2014

View full text Add to dashboard Cite

In this paper, we consider an intrusion detection application for Wireless Sensor Networks. We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous stateaction spaces, in a manner similar to Fuemmeler and Veeravalli (IEEE Trans Signal Process 56(5), [2091][2092][2093][2094][2095][2096][2097][2098][2099][2100][2101] 2008). However, unlike their formulation, we consider infinite horizon discounted and average cost objectives as performance criteria. For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation. Feature-based representations and function approximation is necessary to handle the curse of dimensionality associated with the underlying POMDP. Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation estimate on the faster timescale, while the Q-value parameter (arising from a linear function approximation architecture for the Q-values) is updated in an on-policy temporal difference algorithm-like fashion on the slower timescale. The feature selection scheme employed in each of our algorithms manages the energy and tracking components in a manner that assists the search for the optimal sleep-scheduling policy. For the sake of comparison, in both discounted and average settings, we also develop a function approximation analogue of the Q-learning algorithm. This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees. Finally, we also adapt our algorithms to include a stochastic iterative estimation scheme for the intruder's mobility model and this is useful in settings where the latter is not known. Our simulation results on a synthetic 2-dimensional network setting suggest that our algorithms result in better tracking accuracy at the cost of only a few additional sensors, in comparison to a recent prior work.

show abstract

Section: Related Workmentioning

confidence: 98%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Feature Selectionmentioning

confidence: 99%

“…In [18,22,23], the authors propose RL based medium access control (MAC) protocols for WSNs. The algorithms proposed there attempt to maximize the throughput while being energy efficient.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations