Multi-Agent system has broad application in real world, whose security
performance, however, is barely considered. Reinforcement learning is one of
the most important methods to resolve Multi-Agent problems. At present,
certain progress has been made in applying Multi-Agent reinforcement
learning to robot system, man-machine match, and automatic, etc. However, in
the above area, an agent may fall into unsafe states where the agent may
find it difficult to bypass obstacles, to receive information from other
agents and so on. Ensuring the safety of Multi-Agent system is of great
importance in the above areas where an agent may fall into dangerous states
that are irreversible, causing great damage. To solve the safety problem, in
this paper we introduce a Multi-Agent Cooperation Q-Learning Algorithm based
on Constrained Markov Game. In this method, safety constraints are added to
the set of actions, and each agent, when interacting with the environment to
search for optimal values, should be restricted by the safety rules, so as
to obtain an optimal policy that satisfies the security requirements. Since
traditional Multi-Agent reinforcement learning algorithm is no more suitable
for the proposed model in this paper, a new solution is introduced for
calculating the global optimum state-action function that satisfies the
safety constraints. We take advantage of the Lagrange multiplier method to
determine the optimal action that can be performed in the current state
based on the premise of linearizing constraint functions, under conditions
that the state-action function and the constraint function are both
differentiable, which not only improves the efficiency and accuracy of the
algorithm, but also guarantees to obtain the global optimal solution. The
experiments verify the effectiveness of the algorithm.