In this work, we address a relatively unexplored aspect of designing agents that learn from human reward. We investigate how an agent's non-task behavior can affect a human trainer's training and agent learning. We use the TAMER framework, which facilitates the training of agents by human-generated reward signals, i.e., judgements of the quality of the agent's actions, as the foundation for our investigation. Then, starting from the premise that the interaction between the agent and the trainer should be bi-directional, we propose two new training interfaces to increase a human trainer's active involvement in the training process and thereby improve the agent's task performance. One provides information on the agent's uncertainty which is a metric calculated as data coverage, the other on its performance. Our results from a 51-subject user study show that these interfaces can induce the trainers to train longer and give more feedback. The agent's performance, however, increases only in response to the addition of performance-oriented information, not by sharing uncertainty levels. These results suggest that the organizational maxim about human behavior, "you get what you measure"-i.e., sharing metrics with people causes them to focus on optimizing those metrics while de-emphasizing other objectives-also applies to the training of agents. Using principle component analysis, we show how trainers in the two conditions train agents differently. In addition, by simulating the influence of the agent's uncertainty-informative behavior on a human's training behavior, we show that trainers could be distracted by the agent sharing its uncertainty levels about its actions, giving poor feedback for the sake of reducing the agent's uncertainty without improving the agent's performance.
Achieving accurate navigation and localization is crucial for Autonomous Underwater Vehicle (AUV). Traditional navigation algorithms, such as Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF), require the system model and measurement model for state estimation to obtain the AUV position. However, this may introduce modeling errors and state estimation errors which will affect the final precision of AUV navigation system to a certain extent. To avoid these problems, in this paper, we proposed a deep framework-NavNet-by taking AUV navigation as a deep sequential learning problem. Firstly, the proposed NavNet can take raw sensor data at different frequencies as input, which benefits from the sequential learning capability of Recurrent Neural Network (RNN). Secondly, NavNet takes advantage of a simplified attention mechanism and Fully Connected (FC) layers to output AUV displacements per unit time, which accomplishes low-frequency AUV navigation by accumulation of it. More importantly, there is no need for the model building and state estimation with NavNet, which avoids the import of relevant errors. We compare the performance of NavNet to EKF and UKF using collected data by running Sailfish in the sea. Experimental results show that NavNet has an excellent performance in terms of both the navigation accuracy and fault tolerance. In addition, a reliable fusion strategy of NavNet and conventional method is applied to achieve high-frequency AUV navigation. The experimental results show that the proposed architecture can be a reliable supplement to limit the error growth of conventional algorithms. INDEX TERMS Autonomous underwater vehicle, navigation, extended Kalman filter, unscented Kalman filter, sequential learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.