“…Compared to the classical reinforcement learning models, the A2C algorithm has a more powerful learning ability, since it includes two neural networks: Actor and Critic [30,31], and A2C supports synchronous parallel sampling training, on the one hand, to ensure diversity of data, on the other hand, to improve learning efficiency [32]. Moreover, compared to the Q-learning [33,34] or DQN [35,36] algorithm, it is more suitable for continuous space problems, which is applicable to the vehicle swarm control problem in this study [37,38]. To reflect the cooperative obstacle-avoidance behaviour of the automated vehicle swarm, the optimization target in the A2C algorithm not only incorporates the safety and efficiency of an individual vehicle but also considers the efficiency of the vehicle swarm.…”