In this paper, we propose an energy-efficient architecture which is designed to receive both images and text inputs as a step towards designing reinforcement learning agents that can understand human language and act in real-world environments. We evaluate our proposed method on three different software environments and a low power drone named Crazyflie to navigate towards specified goals and avoid obstacles successfully. To find the most efficient language-guided RL model, we implemented the model with various configurations of image input sizes and text instruction sizes on the Crazyflie drone GAP8 which consists of 8 RISC-V cores. The task completion success rate and onboard power consumption, latency, and memory usage of GAP8 are measured and compared with Jetson TX2 ARM CPU and Raspberry Pi 4. The results show that by decreasing 20% of input image size we achieve up to 78% energy improvement while achieving an 82% task completion success rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.