Evaluating the worst-case performance of a reinforcement learning (RL) agent under the strongest/optimal adversarial perturbations on state observations (within some constraints) is crucial for understanding the robustness of RL agents. However, finding the optimal adversary is challenging, in terms of both whether we can find the optimal attack and how efficiently we can find it. Existing works on adversarial RL either use heuristics-based methods that may not find the strongest adversary, or directly train an RL-based adversary by treating the agent as a part of the environment, which can find the optimal adversary but may become intractable in a large state space. In this paper, we propose a novel attacking algorithm which has an RL-based "director" searching for the optimal policy perturbation, and an "actor" crafting state perturbations following the directions from the director (i.e. the actor executes targeted attacks). Our proposed algorithm, PA-AD, is theoretically optimal against an RL agent and significantly improves the efficiency compared with prior RL-based works in environments with large or pixel state spaces. Empirical results show that our proposed PA-AD universally outperforms state-of-the-art attacking methods in a wide range of environments. Our method can be easily applied to any RL algorithms to evaluate and improve their robustness.Preprint. Under review.
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations, and may thus be subject to dramatic changes over time (e.g. increased number of observable features). However, when the observation space changes, the previous policy will likely fail due to the mismatch of input features, and another policy must be trained from scratch, which is inefficient in terms of computation and sample complexity. Following theoretical insights, we propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task to use as a model-based regularizer. Our algorithm works for drastic changes of observation space (e.g. from vector-based observation to image-based observation), without any inter-task mapping or any prior knowledge of the target task. Empirical results show that our algorithm significantly improves the efficiency and stability of learning in the target task. * The work was done while the author was an intern at Unity Technologies.
Crystalline lithium fluoride (LiF) has been intensively pursued as potential alternative solid electrolytes (SEs) owing to its excellent chemical and electrochemical oxidation stability, and good deformability. However, due to its low ion conductivity, LiF is still challenging for practical SE applications. Herein, Li-Zr-F composite-based SE by liquid-mediated synthesis is proposed to be studied. methanol (CH<sub>3</sub>OH) was mainly evaluated as a liquid-mediated precursor for synthesizing Li-Zr-F composites under the stoichiometric proportion of LiF and ZrF4 (2:1 and 2:0.8) and a subsequent annealing process at 25°C/150°C, 50°C/150°C, and 70°C/150°C, respectively. X-ray diffraction results revealed that the Li-Zr-F composites could be crystallized in the three main types of phase formations, including Li<sub>2</sub>ZrF<sub>6</sub> ( ), Li<sub>2</sub>ZrF<sub>6</sub> ( ), and Li<sub>4</sub>ZrF<sub>8</sub> ( ) octahedron structures. In addition, the effect of cation stack sublattice synthesized by methanol mediator on the ion conduction of Li-Zr-F composites was investigated by using electrochemical impedance spectroscopy (EIS). Through the Zr<sup>4+</sup>-substitution, Li<sub>2</sub>ZrF<sub>6</sub> ( )-based SE exhibited the highest ion conduction which was increased to 2.40 × 10<sup>-8</sup> S/cm and 3.89 × 10<sup>-8</sup> S/cm under the stoichiometric proportion of LiF and ZrF<sub>4</sub> 2:0.8 at a dried temperature of 50°C/150°C with, respectively. A 0.21 eV activation energy ( ) was achieved for a battery with Li<sub>2</sub>ZrF<sub>6</sub> ( )-based SE. Meanwhile, LiF exhibited up to 0.78 eV leading to a low kinetic rate for ion diffusion. These results implied that Li<sub>2</sub>ZrF<sub>6 </sub>( )-based SE was successfully synthesized under the optimal condition of CH<sub>3</sub>OH-50°C/150°C which could improve the ion-conductivity of LiF.
Lower alcohols (C1−C7) are closely related to our life, and some of them are harmful to our body health or not. For example, the methanol in liquor is harmful to...
Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored. Specifically, if communication messages are manipulated by malicious attackers, agents relying on untrustworthy communication may take unsafe actions that lead to catastrophic consequences. Therefore, it is crucial to ensure that agents will not be misled by corrupted communication, while still benefiting from benign communication. In this work, we consider an environment with N agents, where the attacker may arbitrarily change the communication from any C < N −1 2 agents to a victim agent. For this strong threat model, we propose a certifiable defense by constructing a message-ensemble policy that aggregates multiple randomly ablated message sets. Theoretical analysis shows that this message-ensemble policy can utilize benign communication while being certifiably robust to adversarial communication, regardless of the attacking algorithm. Experiments in multiple environments verify that our defense significantly improves the robustness of trained policies against various types of attacks.
Recent studies reveal that a well-trained deep reinforcement learning (RL) policy can be particularly vulnerable to adversarial perturbations on input observations. Therefore, it is crucial to train RL agents that are robust against any attacks with a bounded budget. Existing robust training methods in deep RL either treat correlated steps separately, ignoring the robustness of long-term rewards, or train the agents and RL-based attacker together, doubling the computational burden and sample complexity of the training process. In this work, we propose a strong and efficient robust training framework for RL, named Worst-case-aware Robust RL (WocaR-RL), that directly estimates and optimizes the worst-case reward of a policy under bounded p attacks without requiring extra samples for learning an attacker. Experiments on multiple environments show that WocaR-RL achieves state-ofthe-art performance under various strong attacks, and obtains significantly higher training efficiency than prior state-of-the-art robust training methods. The code of this work is available at https://github.com/umd-huang-lab/WocaR-RL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.