“…Explicit human feedback has also been incorporated into reinforcement learning methods (Knox and Stone, 2009;Pilarski et al, 2011;Daniel et al, 2015;Mathewson and Pilarski, 2016;Warnell et al, 2018;MacGlashan et al, 2017;Arumugam et al, 2019), including in the context of dialogue system learning (Liu et al, 2018). Jaques et al (2020) study forming a reward from implicit feedback for non-task-oriented dialogue language generation, by training multiple models to detect linguistic signals, such as sentiment and lexical overlap, that correlate with explicit user feedback. Learning from users has also been studied by asking users to rank system outputs (e.g., Wilson et al, 2012;Christiano et al, 2017), including for instruction following (Wang et al, 2016) and summarization (Stiennon et al, 2020).…”