In a wireless powered communication network (WPCN), sensor nodes harvest energy to transmit information. By a harvest-then-transmit (HT) protocol, nodes can be classified into either energy receiving (ER) or data transmitting (DT) nodes depending on the current level of the harvested energy. Since nodes may join or leave a network any time and energy levels vary, the distribution of ER and DT nodes changes over time. As the number of contending DT nodes is highly dynamic, a quick learning mechanism is required for an access point (AP). We propose a learning AP that learns from experience and adapts the frame size according to the changes in the number of DT nodes. The proposed learning AP is also shown to learn well and react to the situation. We compare the performances of the proposed learning mechanism with a WPCN and conventional HT FSA schemes. The proposed RL scheme outperforms the comparative schemes in terms of success rate and delay.
KEYWORDSMAC, Q-learning, wireless powered communication networks, wireless sensor networks Int J Commun Syst. 2019;32:e4027. wileyonlinelibrary.com/journal/dac IQBAL ET AL.amount of harvesting and consuming the energy of nodes and dynamically joining/leaving nodes change the number of data transmitting nodes in a network. 10 Machine learning (ML) has drawn much attention recently. Learning algorithms can be considered as a mechanism to find the best solution in a highly fluctuating network situation when mathematical solutions are complex and hard to find. A learning mechanism can be used for a system to adapt the optimal frame size as soon the changes are experienced. Reinforcement learning (RL)-based algorithms are inferred from animal learning-behavior 11,12 and are effective ML algorithms. We have investigated the frame size optimization problem in a highly dynamic energy harvesting network and developed a Q-learning-based mechanism as a solution. The variation in the network status (energy levels of nodes and numbers of data transmitting and energy harvesting nodes) is caused by the random and dynamic energy harvesting and energy consumption of nodes. Since the number of data transmitting nodes is dynamically changing, the optimal communication frame changes as the variations occur. The optimal frame size is directly related with the number of contending nodes since it changes in a dynamic energy harvesting network. It needs to automatically optimize the frame size immediately by the experience and learning from the outcomes of the changes in the number of data transmitting nodes. Predicting or estimating the number of data transmitting nodes in a network helps in adapting the optimal frame size to achieve the maximum channel utilization. We have formulated and developed a semi-random MDP. The states are the frame size and the actions are the increasing/decreasing of the frame size. The learning agent, AP in this case, learns from the contention result experiences and adapts its frame size. The learning of the frame size is performed by observing the number of idle and ...