Spatio-temporal crowdsourcing as a current widely used model, how to solve the equilibrium between the task planning efficiency and the worker's revenue is a challenging task, in order to overcome these problems, we firstly define crowdsourcing task allocation as a multi-party stochastic game, and solve the equilibrium for the case with perfect recall, which gives the optimal policy for the platform and the workers. Based on the previous work, we propose a framework based on the dual actor-critic approach, which learns the collaborative optimal strategies based on actor-critic. We solve the equilibrium by introducing a Monte Carlo tree search algorithm and use a strategy gradient-based approach to optimize the actor and critic parameters. To evaluate our approach, experiments are conducted in several different crowdsourcing scenarios. The experimental results show that our method is able to achieve good performance in different scenarios and has high accuracy and stability in equilibrium solving. We will open our source code to allow other researchers to reproduce our results.