2019
DOI: 10.1109/jiot.2018.2872440
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning-Based Multiaccess Control and Battery Prediction With Energy Harvesting in IoT Systems

Abstract: Energy harvesting (EH) is a promising technique to fulfill the long-term and self-sustainable operations for Internet of things (IoT) systems. In this paper, we study the joint access control and battery prediction problems in a small-cell IoT system including multiple EH user equipments (UEs) and one base station (BS) with limited uplink access channels. Each UE has a rechargeable battery with finite capacity. The system control is modeled as a Markov decision process without complete prior knowledge assumed … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
48
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 113 publications
(48 citation statements)
references
References 39 publications
(78 reference statements)
0
48
0
Order By: Relevance
“…Moreover, the proposed architecture, algorithm and mechanism are also promising to be applied over other large-scale network control problems in the future networks. For example, for the access control problem in the IoT system studied in [34], one could first cluster the IoT devices into different groups and then determine their access control policies respectively. Meanwhile, the strategy of learning under multiple behavior policies and the safeguard mechanism can be exploited to further improve the learning efficiency and the online performance.…”
Section: Discussionmentioning
confidence: 99%
“…Moreover, the proposed architecture, algorithm and mechanism are also promising to be applied over other large-scale network control problems in the future networks. For example, for the access control problem in the IoT system studied in [34], one could first cluster the IoT devices into different groups and then determine their access control policies respectively. Meanwhile, the strategy of learning under multiple behavior policies and the safeguard mechanism can be exploited to further improve the learning efficiency and the online performance.…”
Section: Discussionmentioning
confidence: 99%
“…Moreover, the users' content request distributions and data correlation will change as time elapses. Traditional learning approaches such as [26] must re-implement the learning process as the users' content request distribution and data correlation change. However, the ESN-based transfer RL algorithm can transform the already learned resource block allocation policy into the new resource block allocation policy that must be learned as the users' content request distribution and data correlation change so as to improve the convergence speed.…”
Section: Echo State Network For Self-organizing Resource Allocamentioning
confidence: 99%
“…The procedure described by (14), (15) and (16) is called the fictitious play procedure. As described in (14), at the m th iteration, a node attempts to learn the Nash maximizer, F * m , given that its belief about the distribution of the nodes across the states isπ m .…”
Section: Mf-marl For Distributed Power Controlmentioning
confidence: 99%
“…Next, at the (m + 1) th iteration, each node attempts to learn the Nash maximizer, F * m+1 . A discrete-time MFG is said to have FPP if and only if the procedure described by (14), (15) and (16) converges.…”
Section: Mf-marl For Distributed Power Controlmentioning
confidence: 99%