Partially observable Markov decision processes (POMDPs) have been successfully applied to various robot motion planning tasks under uncertainty. However, most existing POMDP algorithms assume a discrete state space, while the natural state space of a robot is often continuous. This paper presents Monte Carlo Value Iteration (MCVI) for continuous-state POMDPs. MCVI samples both a robot's state space and the corresponding belief space, and avoids inefficient a priori discretization of the state space as a grid. Both theoretical results and preliminary experimental results indicate that MCVI is a promising new approach for robot motion planning under uncertainty.
Inverse Optimal Control (IOC) assumes that demonstrations are the solution to an optimal control problem with unknown underlying costs, and extracts parameters of these underlying costs. We propose the framework of Inverse KKT, which assumes that the demonstrations fulfill the KarushKuhn-Tucker conditions of an unknown underlying constrained optimization problem, and extracts parameters of this underlying problem. Using this we can exploit the latter to extract the relevant task spaces and parameters of a cost function for skills that involve contacts. For a typical linear parameterization of cost functions this reduces to a quadratic program, ensuring guaranteed and very efficient convergence, but we can deal also with arbitrary non-linear parameterizations of cost functions. We also present a nonparametric variant of inverse KKT that represents the cost function as a functional in reproducing kernel Hilbert spaces. The aim of our approach is to push learning from demonstration to more complex manipulation scenarios that include the interaction with objects and therefore the realization of contacts/constraints within the motion. We demonstrate the approach on manipulation tasks such as sliding a box, closing a drawer and opening a door.
Recently, there is the widespread use of mobile devices and sensors, and rapid emergence of new wireless and networking technologies, such as wireless sensor network, device-to-device (D2D) communication, and vehicular ad hoc networks. These networks are expected to achieve a considerable increase in data rates, coverage, and the number of connected devices with a significant reduction in latency and energy consumption. Because there are energy resource constraints in user's devices and sensors, the problem of wireless network resource allocation becomes much more challenging. This leads to the call for more advanced techniques in order to achieve a tradeoff between energy consumption and network performance. In this paper, we propose to use reinforcement learning, an efficient simulation-based optimization framework, to tackle this problem so that user experience is maximized. Our main contribution is to propose a novel non-cooperative and real-time approach based on deep reinforcement learning to deal with the energy-efficient power allocation problem while still satisfying the quality of service constraints in D2D communication.INDEX TERMS Energy efficient wireless communication, power allocation, D2D communication, multiagent reinforcement learning, deep reinforcement learning.
This letter presents the first attempt of exploiting deep learning (DL) in the signal detection of orthogonal frequency division multiplexing with index modulation (OFDM-IM) systems. Particularly, we propose a novel DL-based detector termed as DeepIM, which employs a deep neural network with fully-connected layers to recover data bits in an OFDM-IM system. To enhance the performance of DeepIM, the received signal and channel vectors are pre-processed based on the domain knowledge before entering the network. Using datasets collected by simulations, DeepIM is first trained offline to minimize the bit error rate (BER) and then the trained model is deployed for the online signal detection of OFDM-IM. Simulation results show that DeepIM can achieve a near-optimal BER with a lower runtime than existing hand-crafted detectors.
Device-to-device (D2D) communication is an emerging technology in the evolution of the 5G network enabled vehicle-to-vehicle (V2V) communications. It is a core technique for the next generation of many platforms and applications, e.g. real-time high-quality video streaming, virtual reality game, and smart city operation. However, the rapid proliferation of user devices and sensors leads to the need for more efficient resource allocation algorithms to enhance network performance while still capable of guaranteeing the quality-of-service. Currently, deep reinforcement learning is rising as a powerful tool to enable each node in the network to have a real-time self-organising ability. In this paper, we present two novel approaches based on deep deterministic policy gradient algorithm, namely ''distributed deep deterministic policy gradient'' and ''sharing deep deterministic policy gradient'', for the multi-agent power allocation problem in D2D-based V2V communications. Numerical results show that our proposed models outperform other deep reinforcement learning approaches in terms of the network's energy efficiency and flexibility. INDEX TERMS Non-cooperative D2D communication, D2D-based V2V communications, power allocation, multi-agent deep reinforcement learning, and deep deterministic policy gradient (DDPG).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.