The ability for a manipulator-equipped chaser spacecraft to autonomously capture a target spacecraft is an unsolved prerequisite for space debris removal and on-orbit servicing. This thesis investigates using deep reinforcement learning (DRL) to improve the capabilities of a manipulator-equipped chaser at this task. DRL allows for behaviour to be learned, rather than designed, according to a simple reward function.DRL uses trial-and-error to learn the behaviour, which is not feasible to perform on-board a spacecraft. Training must therefore be performed in simulation with the resulting behaviour transferred to the spacecraft. Transferring the learned-insimulation behaviour to a real robot, however, is difficult due to dynamics differences between the simulator and the real world, i.e., the simulation-to-reality gap. This thesis develops, over the course of four increasingly-difficult applications, a solution to the simulation-to-reality gap by restricting DRL to exclusively learn the guidance portion of the guidance, navigation, and control system needed for autonomous spacecraft operations. The first application is spacecraft proximity operations (without capture), where a DRL-based guidance strategy issuing desired velocity signals is designed, trained, and evaluated in simulation and experiment. Next, the DRL-based guidance strategy is improved upon and applied to a quadrotor proximity operations scenario. Here, it is demonstrated in simulation and experiment that desired acceleration signals lead to better performance compared to desired velocity signals. These two proof-of-concept results show the proposed DRL-based guidance strategy is viable for bringing DRL to real aerospace vehicles. Next, the DRL-based guidance strategy is applied to a more difficult scenario: a multi-agent cooperative quadrotor runway inspection task, where fault-tolerant behaviour is successfully learned and demonstrated in both simulation and a real, outdoor, GPS-driven quadrotor facility. Finally, with the now-developed DRL-based guidance strategy, the author returns to the central motivator for this research: autonomous manipulator-based capture of a iii spinning spacecraft. The DRL-based guidance strategy learns this task in simulation and is successfully transferred to an experimental facility where similar results are obtained. Additionally, capture is successful in experiment despite large perturbations and initial conditions not seen during training. Improvements to the experimental facility were performed to enable this research. iv
PrefaceThis is an 'integrated thesis' that contains chapters which have already been published or have been prepared for publication as journal articles or conference proceedings.Chapter 2 has been peer-reviewed and was published in the Journal of Spacecraft and Rockets, with the authors retaining copyright. The paper is included in this thesis with minor formatting changes and variable renaming (for consistency between chapters), along with improvements to the theory in Sec. 2.3. This paper was c...