Social Bandit Learning: Strangers Can Help

Zong, Jun; Liu, Ting; Zhu, Zhaowei; Luo, Xiliang; Qian, Hua

doi:10.1109/wcsp49889.2020.9299725

Cited by 1 publication

(2 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To assess the performance of our social learning algorithm in comparison to alternative methods, we created various scenarios involving learning from non-learners or different types of individual learners. In this section, we compared the performance of our method, SBL-FE, with TUCB [27], OUCB [28] (as social learning algorithms), TS, and UCB (as individual learning methods), using the cumulative regret criteria. We used the same hyperparameters for OUCB and TUCB as stated in their respective papers, and for all subsequent results, we employed the same hyperparameter set.…”

Section: A the Ability Of Social Learning Methods In Different Societiesmentioning

confidence: 99%

See 1 more Smart Citation

Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

Mirzaei,

Tavakoli,

Shariatpanahi

et al. 2024

Preprint

View full text Add to dashboard Cite

Personalized AI-based services involve a population of individual reinforcement learning agents. However, most reinforcement learning algorithms focus on harnessing individual learning and fail to leverage the social learning capabilities commonly exhibited by humans and animals. Social learning integrates individual experience with observing others’ behavior, presenting opportunities for improved learning outcomes. In this study, we focus on a social bandit learning scenario where a social agent observes other agents’ actions without knowledge of their rewards. The agents independently pursue their own rewards without any explicit motivation for teaching one another. We propose a free energy-based social bandit learning algorithm over policy space, where the social agent evaluates others’ expertise levels without resorting to any oracle or social norms. Accordingly, the social agent integrates its direct experiences in the environment and others’ estimated policies. The theoretical convergence of our algorithm to the optimal policy is proven. Empirical evaluations validate the superiority of our social learning method over alternative approaches in diverse scenarios. Our algorithm strategically identifies the relevant agents, even in the presence of random or sub-optimal agents, and skillfully exploiters their behavioral information. Extra to societies including expert agents, in the presence of relevant but non-expert agents, our algorithm significantly enhances individual learning performance, where most related methods fail. Importantly, it maintains logarithmic regret.

show abstract

Section: A the Ability Of Social Learning Methods In Different Societiesmentioning

confidence: 99%

“…Our work is mostly related to [27] and [28], who proposed a social bandit learning algorithm inspired by the Upper Confidence Bound (UCB) learning method to enhance agents' decisions by considering other agents' actions. Both methods are based on the optimism principle about the average of observed policies.…”

Section: Related Workmentioning

confidence: 99%