SUMMARYThis paper is motivated by an optimal boundary control problem for the cooling process of molten and already formed glass down to room temperature. The high temperatures at which glass is processed demand to include radiative heat transfer in the computational model. Since the complete radiative heat transfer equations are too complex for optimization purposes, we use simplified approximations of spherical harmonics coupled with a practically relevant frequency bands model. The optimal control problem is considered as a partial differential algebraic equation (PDAE)-constrained optimization problem with box constraints on the control. In this paper, we augment the objective by a functional depending on the state gradient, which forces a minimization of thermal stress inside the glass. To guarantee consistent and grid-independent values of the reduced objective gradient at the end of the cooling process, we pursue two approaches. The first includes the temperature gradient with a time-dependent linearly decreasing weight. In the second approach, we augment the objective functional by the final state tracking and final state gradient optimization. To determine an optimal boundary control, we apply a projected gradient method with the Armijo step size rule. The reduced objective gradient is computed by the continuous adjoint approach. The arising time-dependent PDAEs are numerically solved by variable step size one-step methods of Rosenbrock type in time and adaptive multilevel finite elements in space. We present two-dimensional numerical results for an infinitely long glass block and compare the two different approaches derived to ensure consistency at the end of the cooling process.
Reinforcement learning (RL) has become a highly successful framework for learning in Markov decision processes (MDP). Due to the adoption of RL in realistic and complex environments, solution robustness becomes an increasingly important aspect of RL deployment. Nevertheless, current RL algorithms struggle with robustness to uncertainty, disturbances, or structural changes in the environment. We survey the literature on robust approaches to reinforcement learning and categorize these methods in four different ways: (i) Transition robust designs account for uncertainties in the system dynamics by manipulating the transition probabilities between states; (ii) Disturbance robust designs leverage external forces to model uncertainty in the system behavior; (iii) Action robust designs redirect transitions of the system by corrupting an agent’s output; (iv) Observation robust designs exploit or distort the perceived system state of the policy. Each of these robust designs alters a different aspect of the MDP. Additionally, we address the connection of robustness to the risk-based and entropy-regularized RL formulations. The resulting survey covers all fundamental concepts underlying the approaches to robust reinforcement learning and their recent advances.
COCoMoPL [6] is a recently developed approach Combining Optimal Control, Movement Primitives and Learning for the generation of humanoid walking motions. It solves optimal control problems based on detailed dynamic models of the robot for a variety of walking parameters and uses the solutions as training data to create movement primitives that are very close to feasibility and optimality. These can be employed to synthesize complex walking sequences for humanoid robots online in a very efficient way. We demonstrate, for the first time, that COCoMoPL works on a real humanoid robot, here HRP-2 with 36 DOF and 30 position controlled actuators. To this end, it was necessary to significantly extend the existing approach by including transition steps into the training data, modify the movement primitives (MP) to admit these transitions, improve the representation of the ZMP MPs and tighten the transition conditions at the beginning and end of steps. We present a thorough validation of the method in simulation and on the real robot for a challenging sequence of movements. We also compare the characteristics of movements after each step of the methodology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.