2014
DOI: 10.1007/s11276-014-0762-6
|View full text |Cite
|
Sign up to set email alerts
|

Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks

Abstract: In this paper, we consider an intrusion detection application for Wireless Sensor Networks. We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous stateaction spaces, in a manner similar to Fuemmeler and Veeravalli (IEEE Trans Signal Process 56(5), [2091][2092][2093][2094][2095][2096][209… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…Specifically, the forward propagation algorithms apply the carefully trained weight matrixes and bias vectors for carrying out the associated linear and [294] reduced-state SARSA cellular network dynamic channel allocation considering both mobile traffic and call handoffs. [295] on-policy SARSA CR network distributed multiagent sensing policy relying on local interactions among SU [296] on-policy SARSA MANET energy-aware reactive routing protocol for maximizing network lifetime [297] on-policy SARSA HetNet resource management for maximizing resource utilization and guaranteeing QoS [298] approximate SARSA P2P network energy harvesting aided power allocation policy for maximizing the throughput [299] Q-learning WBAN power control scheme to mitigate interference and to improve throughput [300] Q-learning OFDM system adaptive modulation and coding not relying on off-line training from PHY [301] Q-learning cooperative network efficient relay selection scheme meeting the symbol error rate requirement [302] decentralized Q-learning CR network aggregated interference control without introducing signaling overhead [303] convergent Q-learning WSN sensors' sleep scheduling scheme for minimizing the tracking error activation operations. By contrast, the backward propagation algorithms, which are widely used in the industrial field define a so-called loss function for quantifying the difference between the output produced by the training samples' and the real output.…”
Section: Deep Learning In Wireless Networkmentioning
confidence: 99%
“…Specifically, the forward propagation algorithms apply the carefully trained weight matrixes and bias vectors for carrying out the associated linear and [294] reduced-state SARSA cellular network dynamic channel allocation considering both mobile traffic and call handoffs. [295] on-policy SARSA CR network distributed multiagent sensing policy relying on local interactions among SU [296] on-policy SARSA MANET energy-aware reactive routing protocol for maximizing network lifetime [297] on-policy SARSA HetNet resource management for maximizing resource utilization and guaranteeing QoS [298] approximate SARSA P2P network energy harvesting aided power allocation policy for maximizing the throughput [299] Q-learning WBAN power control scheme to mitigate interference and to improve throughput [300] Q-learning OFDM system adaptive modulation and coding not relying on off-line training from PHY [301] Q-learning cooperative network efficient relay selection scheme meeting the symbol error rate requirement [302] decentralized Q-learning CR network aggregated interference control without introducing signaling overhead [303] convergent Q-learning WSN sensors' sleep scheduling scheme for minimizing the tracking error activation operations. By contrast, the backward propagation algorithms, which are widely used in the industrial field define a so-called loss function for quantifying the difference between the output produced by the training samples' and the real output.…”
Section: Deep Learning In Wireless Networkmentioning
confidence: 99%
“…Moreover, due to the extensive research on machine learning, lots of researches have upgraded and extended the algorithm, which can be widely used in WSNs [22]. In [23], they proposed a double time scale Q-learning algorithm with function approximation to alleviate the curse of dimension problems. Although all of the above algorithms alleviate the state explosion problem, it is necessary to solve the action explosion problem to obtain the scalable solution.…”
Section: Related Workmentioning
confidence: 99%
“…Otherwise, this action will be punished. Corresponding reward or penalty is responsible for adjusting weight parameters for the deep neural network [23].…”
Section: Scheduling Strategy For Mtt-wsnsmentioning
confidence: 99%