In wireless control systems, remote control of plants is achieved through closing of the control loop over a wireless channel. As wireless communication is noisy and subject to packet dropouts, proper allocation of limited resources, e.g. transmission power, across plants is critical for maintaining reliable operation. In this paper, we formulate the design of an optimal resource allocation policy that uses current plant states and wireless channel conditions to assign resources used to send control actuation information back to plants. While this problem is challenging due to its infinite dimensionality and need for explicit system model and state knowledge, we propose the use of deep reinforcement learning techniques to find data-driven resource allocation policies. In particular, we use model-free policy gradient methods to directly learn continuous power allocation policies without knowledge of plant dynamics or communication models. Numerical simulations demonstrate the strong performance of learned policies relative to baseline resource allocation methods. I. INTRODUCTION Wireless communication networks are frequently used to exchange data between plants, sensors and actuators in control systems. The use of wireless networks in lieu of wired communication makes the installation of components easier and maintenance more flexible, but also adds particular challenges to the design of control and communication policies [1], [2]. Wireless networks are in general more noisy than their wired counterparts [3], while reliable operation of control systems demands rapid communication and low message error rates-requirements in turn constrained by the limited resources available in that network. It is natural in this setting to look for an optimal way to distribute the resources available in the network among the plants sharing that communication medium. In wireless networks, resource allocation aims to balance power consumption, latency and reliability while taking into account stochastic noise and rapid variability [4], [5]. That involves optimizing a performance metric over a function, resulting in an infinite dimensional problem that is usually hard to solve. The formulation of the resource allocation problem, however, resembles a statistical learning problem [5], which allows one to treat resource allocation in a datadriven fashion [6], [7]. When control plants share a common communication medium, resource allocation policies should also take into account the dynamics of each plant. For a