Abstract:With the advance in algorithms, deep reinforcement learning (DRL) offers solutions to trajectory planning under uncertain environments. Different from traditional trajectory planning which requires lots of effort to tackle complicated high-dimensional problems, the recently proposed DRL enables the robot manipulator to autonomously learn and discover optimal trajectory planning by interacting with the environment. In this article, we present state-of-the-art DRL-based collision-avoidance trajectory planning fo… Show more
“…These traditional methods are limited to low-dimensional problems or prone to getting stuck in local minima. Recently, more and more researchers have resorted to reinforcement learning to tackle the complicated planning problems for uncertain environments with human Frontiers in Human Neuroscience frontiersin.org coexistent (Chen et al, 2022). This will be a future research direction for the baby stroller autonomous movement.…”
The increasing number of newborns has stimulated the infant market. In particular, the baby stroller, serving as an important life partner for both babies and parents, has attracted more attention from society. Stroller design and functionality are of vital importance to babies' physiological and psychological health as well as brain development. Therefore, in this paper, we propose a modularization design method for the novel four-wheeled baby stroller based on the KANO model to ensure the mechanical safety and involve more functionalities. Manual control of the baby stroller requires the rapid response of human motor systems in a completely controlled manner, which could be a potential risk. To enhance the safety and stability of the stroller motion, especially in situations where manual control is hard to achieve (e.g., sharp turns), we propose an autonomous motion control scheme based on model predictive control. Both the modularization design and the motion controller are verified in the MATLAB simulation environment through path tracking tasks. The feasibility is validated by the satisfactory experimental results with lateral position error in a reasonable range and good trajectory smoothness.
“…These traditional methods are limited to low-dimensional problems or prone to getting stuck in local minima. Recently, more and more researchers have resorted to reinforcement learning to tackle the complicated planning problems for uncertain environments with human Frontiers in Human Neuroscience frontiersin.org coexistent (Chen et al, 2022). This will be a future research direction for the baby stroller autonomous movement.…”
The increasing number of newborns has stimulated the infant market. In particular, the baby stroller, serving as an important life partner for both babies and parents, has attracted more attention from society. Stroller design and functionality are of vital importance to babies' physiological and psychological health as well as brain development. Therefore, in this paper, we propose a modularization design method for the novel four-wheeled baby stroller based on the KANO model to ensure the mechanical safety and involve more functionalities. Manual control of the baby stroller requires the rapid response of human motor systems in a completely controlled manner, which could be a potential risk. To enhance the safety and stability of the stroller motion, especially in situations where manual control is hard to achieve (e.g., sharp turns), we propose an autonomous motion control scheme based on model predictive control. Both the modularization design and the motion controller are verified in the MATLAB simulation environment through path tracking tasks. The feasibility is validated by the satisfactory experimental results with lateral position error in a reasonable range and good trajectory smoothness.
“…Reinforcement learning is an overall process that refers to the agent’s trial, evaluation, and action memory ( Clifton and Laber, 2020 ; Chen et al, 2022 ; Cong et al, 2022 ; Li et al, 2022 ). The agent’s learning maps from environment state to action, causing it to reap the greatest rewards after carrying out a particular action.…”
As astronauts perform on-orbit servicing of extravehicular activity (EVA) without the help of the space station’s robotic arms, it will be rather difficult and labor-consuming to maintain the appropriate position in case of impact. In order to solve this problem, we propose the development of a wearable robotic limb system for astronaut assistance and a variable damping control method for maintaining the astronaut’s position. The requirements of the astronaut’s impact-resisting ability during EVA were analyzed, including the capabilities of deviation resistance, fast return, oscillation resistance, and accurate return. To meet these needs, the system of the astronaut with robotic limbs was modeled and simplified. In combination with this simplified model and a reinforcement learning algorithm, a variable damping controller for the end of the robotic limb was obtained, which can regulate the dynamic performance of the robot end to resist oscillation after impact. A weightless simulation environment for the astronaut with robotic limbs was constructed. The simulation results demonstrate that the proposed method can meet the recommended requirements for maintaining an astronaut’s position during EVA. No matter how the damping coefficient was set, the fixed damping control method failed to meet all four requirements at the same time. In comparison to the fixed damping control method, the variable damping controller proposed in this paper fully satisfied all the impact-resisting requirements by itself. It could prevent excessive deviation from the original position and was able to achieve a fast return to the starting point. The maximum deviation displacement was reduced by 39.3% and the recovery time was cut by 17.7%. Besides, it also had the ability to prevent reciprocating oscillation and return to the original position accurately.
“…Undesirable overestimation bias and accumulation of function approximation errors in temporal difference methods may lead to sub-optimal policy updates and divergent behaviors (Thrun and Schwartz, 1993 ; Pendrith and Ryan, 1997 ; Fujimoto et al, 2018 ; Chen et al, 2022 ). Most model-free off-policy RL methods learn approximate lower confidence bound of Q-function (Fujimoto et al, 2018 ; Kuznetsov et al, 2020 ; Lan et al, 2020 ; Chen et al, 2021 ; Lee et al, 2021 ) to avoid overestimation by introducing underestimation bias.…”
IntroductionThe value approximation bias is known to lead to suboptimal policies or catastrophic overestimation bias accumulation that prevent the agent from making the right decisions between exploration and exploitation. Algorithms have been proposed to mitigate the above contradiction. However, we still lack an understanding of how the value bias impact performance and a method for efficient exploration while keeping stable updates. This study aims to clarify the effect of the value bias and improve the reinforcement learning algorithms to enhance sample efficiency.MethodsThis study designs a simple episodic tabular MDP to research value underestimation and overestimation in actor-critic methods. This study proposes a unified framework called Realistic Actor-Critic (RAC), which employs Universal Value Function Approximators (UVFA) to simultaneously learn policies with different value confidence-bound with the same neural network, each with a different under overestimation trade-off.ResultsThis study highlights that agents could over-explore low-value states due to inflexible under-overestimation trade-off in the fixed hyperparameters setting, which is a particular form of the exploration-exploitation dilemma. And RAC performs directed exploration without over-exploration using the upper bounds while still avoiding overestimation using the lower bounds. Through carefully designed experiments, this study empirically verifies that RAC achieves 10x sample efficiency and 25% performance improvement compared to Soft Actor-Critic in the most challenging Humanoid environment. All the source codes are available at https://github.com/ihuhuhu/RAC.DiscussionThis research not only provides valuable insights for research on the exploration-exploitation trade-off by studying the frequency of policies access to low-value states under different value confidence-bounds guidance, but also proposes a new unified framework that can be combined with current actor-critic methods to improve sample efficiency in the continuous control domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.