Xiaoteng Ma scite author profile

With the development of economic globalization, culture is a key factor supporting the sustainability of foreign direct investment (FDI), especially for multinational enterprises. This paper takes the Chinese capital market as a sample and, combined with interviews with managers of international joint-venture securities (IJVS), finds that the culture of participants formed in developed and emerging capital market has a significant impact on the performance of IJVS. Using the degree of price fluctuation to measure the risk culture of each capital market, this paper observes that the risk culture in the Chinese capital market is significantly stronger than that of developed countries. This paper also finds that the stronger the risk culture IJVS shareholders have, the better they can adapt to the environment of the Chinese capital market and the better the performance they can achieve. Furthermore, risk culture distance, calculated by the risk culture differences between foreign shareholders and Chinese capital market, are significantly negatively correlated with IJVS performance and efficiency.

show abstract

Offline Reinforcement Learning with Value-based Episodic Memory

Ma¹,

Yang²,

Hu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data. Most existing offline RL algorithms use regularization or constraints to suppress extrapolation error for actions outside the dataset. In this paper, we adopt a different framework, which learns the V -function instead of the Q-function to naturally keep the learning procedure within the support of an offline dataset. To enable effective generalization while maintaining proper conservatism in offline learning, we propose Expectile V -Learning (EVL), which smoothly interpolates between the optimal value learning and behavior cloning. Further, we introduce implicit planning along offline trajectories to enhance learned V -values and accelerate convergence. Together, we present a new offline method called Value-based Episodic Memory (VEM). We provide theoretical analysis for the convergence properties of our proposed VEM method, and empirical results in the D4RL benchmark show that our method achieves superior performance in most tasks, particularly in sparse-reward tasks.

show abstract

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Yang¹,

Ma²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. However, compared with the single-agent counterpart, offline multiagent RL introduces more agents with the larger state and action space, which is more challenging but attracts little attention. We demonstrate current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error. In this paper, we propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error by only trusting the state-action pairs given in the dataset for value estimation. Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint. Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents. We further show that ICQ achieves the state-of-the-art performance in the challenging multi-agent offline tasks (StarCraft II). IntroductionRecently, reinforcement learning (RL), an active learning process, has achieved massive success in various domains ranging from strategy games [51] to recommendation systems [6]. However, applying RL to real-world scenarios poses practical challenges: interaction with the real world, such as autonomous driving, is usually expensive or risky. To solve these issues, offline RL is an excellent choice to deal with practical problems [2,22,30,36,13,24,3,21,46,10], aiming at learning from a fixed dataset without interaction with environments.The greatest obstacle of offline RL is the distribution shift issue [14], which leads to extrapolation error, a phenomenon in which unseen state-action pairs are erroneously estimated. Unlike the online setting, the inaccurate estimated values of unseen pairs cannot be corrected by interacting with the environment. Therefore, most off-policy RL algorithms fail in the offline tasks due to intractable overestimation. Modern offline methods (e.g., Batch-Constrained deep Q-learning (BCQ) [14]) aim to enforce the learned policy to be close to the behavior policy or suppress the Q-value directly. These methods have achieved massive success in challenging single-agent offline tasks like D4RL [12].However, many decision processes in real-world scenarios belong to multi-agent systems, such as intelligent transportation systems [1], sensor networks [31], and power grids [5]. We demonstrate that unseen state-action pairs will grow exponentially as the number of agents increases in multi-agent systems, accumulating the extrapolation error quickly. Moreover, the current offline algorithms † Equal Contribution.‡ Corresponding Author.Preprint. Under review.

show abstract

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

Zhang

Wang

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaoteng Ma

Air-Combat Strategy Using Deep Q-Learning

The Influence of Risk Culture on the Performance of International Joint-Venture Securities

Offline Reinforcement Learning with Value-based Episodic Memory

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

Contact Info

Product

Resources

About