Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

Amhraoui, Elmehdi; Masrour, Tawfik

doi:10.1007/s10846-023-01917-z

Cited by 2 publications

(1 citation statement)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the Bellman Equation [33,34], given a search strategy π, we define Q as the search state s t , and search action a t and the expectation of the reward discount sum of the subsequent time steps in strategy π. The implementation of the Q-learning method is as follows: At each time step t, we observe the current search state s t , and select and execute the search action a t .…”

Section: Algorithm Designingmentioning

confidence: 99%

Collaborative Search Model for Lost-Link Borrowers Information Based on Multi-Agent Q-Learning

You,

Guo,

Dagestani

et al. 2023

Axioms

View full text Add to dashboard Cite

To reduce the economic losses caused by debt evasion amongst lost-link borrowers (LBs) and improve the efficiency of finding information on LBs, this paper focuses on the cross-platform information collaborative search optimization problem for LBs. Given the limitations of platform/system heterogeneity, data type diversity, and the complexity of collaborative control in cross-platform information search for LBs, a collaborative search model for LBs’ information based on multi-agent technology is proposed. Additionally, a multi-agent Q-learning algorithm for the collaborative scheduling of multi-search subtasks is designed. We use the Q-learning algorithm based on function approximation to update the description model of the LBs. The multi-agent collaborative search problem is transformed into a reinforcement learning problem by defining search states, search actions, and reward functions. The results indicate that: (i) this model greatly improves the comprehensiveness and accuracy of the search for key information of LBs compared with traditional search engines; (ii) during searching for the information of LBs, the agent is more inclined to search on platforms and data types with larger environmental rewards, and the multi-agent Q-learning algorithm has a stronger ability to acquire information value than the transition probability matrix algorithm and the probability statistical algorithm for the same number of searches; (iii) the optimal search times of the multi-agent Q-learning algorithm are between 14 and 100. Users can flexibly set the number of searches within this range. It is significant for improving the efficiency of finding key information related to LBs.

show abstract

Section: Algorithm Designingmentioning

confidence: 99%