The mechanism design theory can be applied not only in the economy but also in many fields, such as politics and military affairs, which has important practical and strategic significance for countries in the period of system innovation and transformation. As Nobel Laureate Paul said, the complexity of the real economy makes it difficult for “Unorganized Markets” to ensure supply-demand balance and the efficient allocation of resources. When traditional economic theory cannot explain and calculate the complex scenes of reality, we require a high-performance computing solution based on traditional theory to evaluate the mechanisms, meanwhile, get better social welfare. The mechanism design theory is undoubtedly the best option. Different from other existing works, which are based on the theoretical exploration of optimal solutions or single perspective analysis of scenarios, this paper focuses on the more real and complex markets. It explores to discover the common difficulties and feasible solutions for the applications. Firstly, we review the history of traditional mechanism design and algorithm mechanism design. Subsequently, we present the main challenges in designing the actual data-driven market mechanisms, including the inherent challenges in the mechanism design theory, the challenges brought by new markets and the common challenges faced by both. In addition, we also comb and discuss theoretical support and computer-aided methods in detail. This paper guides cross-disciplinary researchers who wish to explore the resource allocation problem in real markets for the first time and offers a different perspective for researchers struggling to solve complex social problems. Finally, we discuss and propose new ideas and look to the future.
In recent years, deep reinforcement learning (DRL) achieves great success in many fields, especially in the field of games, such as AlphaGo, AlphaZero, and AlphaStar. However, due to the reward sparsity problem, the traditional DRL-based method shows limited performance in 3D games, which contain much higher dimension of state space. To solve this problem, in this paper, we propose an intrinsic-based policy optimization (IBPO) algorithm for reward sparsity. In the IBPO, a novel intrinsic reward is integrated into the value network, which provides an additional reward in the environment with sparse reward, so as to accelerate the training. Besides, to deal with the problem of value estimation bias, we further design three types of auxiliary tasks, which can evaluate the state value and the action more accurately in 3D scenes. Finally, a framework of auxiliary intrinsic-based policy optimization (AIBPO) is proposed, which improves the performance of the IBPO. The experimental results show that the method is able to deal with the reward sparsity problem effectively. Therefore, the proposed method may be applied to real-world scenarios, such as 3-dimensional navigation and automatic driving, which can improve the sample utilization to reduce the cost of interactive sample collected by the real equipment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.