Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication

He, Xu; An, Bo; Li, Yanghua; Chen, Haikai; Wang, Rundong; Wang, Xinrun; Rong, Yu; Li, Xin; Wang, Zhirong

doi:10.1145/3383313.3412233

Cited by 26 publications

(19 citation statements)

References 26 publications

(31 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Generate the prototype item a p t and the recommended item a r t using Equations ( 6) and (7), respectively; Compute the losses L q , L r,c based on the minibatch, using Equations ( 17) and (20), respectively;…”

Section: Loss Function Of the Criticmentioning

confidence: 99%

“…RL-based models aim to learn an optimal strategy to maximize the long terms rewards. RL-based models can be divided into three categories, the policy-based methods [4], [33], [31], [1], [12], [2], [30], the value-based methods [32], [34], [37], [11], and the actor-critic based methods [28], [6], [5], [7], [29], [24]. Chen et al [1] propose to use a balanced hierarchical clustering tree to tackle the large action space problem.…”

Section: Rl-based Recommendationmentioning

confidence: 99%

“…Zhou et al [35] propose to obtain item representations through a graph convolution network and obtain users' state embedding vectors through a recurrent neural network, and then trains the recommendation strategy with the Q-leaning algorithm. He et al [7] and Feng et al [5] treat each agent as a scenario and use a multi-agent framework to improve the total performance of all scenarios. Zhang et al [28] apply a multi-agent reinforcement learning method to coauthor network analysis for the dynamic collaboration recommendation.…”

Section: Rl-based Recommendationmentioning

confidence: 99%

“…To address the above challenges, motivated by the success of the reinforcement learning (RL) based recommendation models [32], [15], [36], [17], [5], [9], [7], we propose a novel RL-based model, called Temporary Interest Aware Recommendation (TIARec), which is able to distinguish atypical interactions from normal ones without annotations and capture the temporary interest as well as the general preference. TIARec contains a recommender agent and an auxiliary classifier agent.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning from Atypical Behavior: Temporary Interest Aware Recommendation Based on Reinforcement Learning

Du¹,

Yang²,

Yu³

2022

Preprint

View full text Add to dashboard Cite

Traditional robust recommendation methods view atypical user-item interactions as noise and aim to reduce their impact with some kind of noise filtering technique, which often suffers from two challenges. First, in real world, atypical interactions may signal users' temporary interest different from their general preference. Therefore, simply filtering out the atypical interactions as noise may be inappropriate and degrade the personalization of recommendations. Second, it is hard to acquire the temporary interest since there are no explicit supervision signals to indicate whether an interaction is atypical or not. To address this challenges, we propose a novel model called Temporary Interest Aware Recommendation (TIARec), which can distinguish atypical interactions from normal ones without supervision and capture the temporary interest as well as the general preference of users. Particularly, we propose a reinforcement learning framework containing a recommender agent and an auxiliary classifier agent, which are jointly trained with the objective of maximizing the cumulative return of the recommendations made by the recommender agent. During the joint training process, the classifier agent can judge whether the interaction with an item recommended by the recommender agent is atypical, and the knowledge about learning temporary interest from atypical interactions can be transferred to the recommender agent, which makes the recommender agent able to alone make recommendations that balance the general preference and temporary interest of users. At last, the experiments conducted on real world datasets verify the effectiveness of TIARec.

show abstract

Section: Loss Function Of the Criticmentioning

confidence: 99%

Section: Rl-based Recommendationmentioning

confidence: 99%

Section: Rl-based Recommendationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning from Atypical Behavior: Temporary Interest Aware Recommendation Based on Reinforcement Learning

Du¹,

Yang²,

Yu³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In recent years, to promoting the recommendation models to search meaningful paths rather than enumerate all possible paths in KGs, RL has been gradually introduced in recommendations. Some RL-based recommendation models [4,9,41] have achieved outstanding performance in recommendation. For example, Song et al [20] proposed a knowledge-aware recommendation model to generates meaningful paths from users to relevant items by learning a walking policy on the user-item-entity graph, which is designed to deal with the data sparsity and cold start problems.…”

Section: Recommendation With Rlmentioning

confidence: 99%

Reinforced KGs reasoning for explainable sequential recommendation

et al. 2021

View full text Add to dashboard Cite

We explore the semantic-rich structured information derived from the knowledge graphs (KGs) associated with the user-item interactions and aim to reason out the motivations behind each successful purchase behavior. Existing works on KGs-based explainable recommendations focus purely on path reasoning based on current useritem interactions, which generally result in the incapability of conjecturing users' subsequence preferences. Considering this, we attempt to model the KGs-based explainable recommendation in sequential settings. Specifically, we propose a novel architecture called Reinforced Sequential Learning with Gated Recurrent Unit (RSL-GRU), which is composed of a Reinforced Path Reasoning Network (RPRN) component and a GRU component. RSL-GRU takes users' sequential behaviors and their associated KGs in chronological order as input and outputs potential top-N items for each user with appropriate reasoning paths from a global perspective. Our RPRN features a remarkable path reasoning capacity, which is regulated by a userconditioned derivatively action pruning strategy, a soft reward strategy based on an improved multi-hop scoring function, and a policy-guided sequential path reasoning algorithm. Experimental results on four of Amazon's large-scale datasets show that our method achieves excellent results compared with several state-of-the-art alternatives. Powered by Editorial Manager® and ProduXion Manager® from Aries Systems CorporationAlthough the time complexity of our model is a little higher in the worst situation than Ω 2 in DAN, KPRN, and KARN, its calculation is much smaller compared with them. Point 2.Datasets in section 5.1.1 need to be given the links or citations.Thank you for your suggestion. We add a link (https://nijianmo.github.io/amazon/index.html) to the datasets in section 5.1.1 on page 13.Noname manuscript No.

show abstract

Reinforcement Learning-Based Recommendation with User Reviews on Knowledge Graphs

Zhang

Ouyang

Liu

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication

Cited by 26 publications

References 26 publications

Learning from Atypical Behavior: Temporary Interest Aware Recommendation Based on Reinforcement Learning

Learning from Atypical Behavior: Temporary Interest Aware Recommendation Based on Reinforcement Learning

Reinforced KGs reasoning for explainable sequential recommendation

Reinforcement Learning-Based Recommendation with User Reviews on Knowledge Graphs

Contact Info

Product

Resources

About