“…Multi-Agent Multi-Armed Bandits. Existing decentralized cooperative MAB algorithms make one or more of the following assumptions: (i) agents independently interact with the same MAB (Lupu, Durand, and Precup 2019), (ii) they use sophisticated communication protocols to exchange information about rewards and the number of times actions were played (Landgren, Srivastava, and Leonard 2016;Martínez-Rubio, Kanade, and Rebeschini 2019;Sankararaman, Ganesh, and Shakkottai 2019;Shahrampour, Rakhlin, and Jadbabaie 2017), (iii) when sophisticated communication is not possible, agents share their latest action and reward (Madhushani and Leonard 2019). These assumptions are often required to simplify the analysis, but are not realistic in most human-AI interactions, e.g., collaborative transport, assembly, cooking, or autonomous driving, where (i) agents' actions influence the outcome for the whole team, (ii) they do not have explicit communication channels or might have different state, action representations that are difficult to communicate, or (iii) they have different capabilities, e.g., noisier sensors.…”