Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval 2016
DOI: 10.1145/2911451.2914798
|View full text |Cite
|
Sign up to set email alerts
|

Online Learning to Rank for Information Retrieval

Abstract: During the past 10-15 years offline learning to rank has had a tremendous influence on information retrieval, both scientifically and in practice. Recently, as the limitations of offline learning to rank for information retrieval have become apparent, there is increased attention for online learning to rank methods for information retrieval in the community. Such methods learn from user interactions rather than from a set of labeled data that is fully available for training up front.Below we describe why we be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 45 publications
(27 citation statements)
references
References 34 publications
(29 reference statements)
0
27
0
Order By: Relevance
“…This problem is usually formalized as a multi-armed bandit problem [63] or a contextual bandit problem [53]. Both views are extensively covered in recent tutorials [32,33,61], which discuss such problems as dueling bandit gradient descent [77] and the exploration vs. exploitation trade-off [38]. Lattimore and Szepesvári [52,Chapter 32] present a theoretical framework for using bandit algorithms for IR, and highlight unique challenges and ways to address them in the online setting.…”
Section: Scoring and Rankingmentioning
confidence: 99%
See 1 more Smart Citation
“…This problem is usually formalized as a multi-armed bandit problem [63] or a contextual bandit problem [53]. Both views are extensively covered in recent tutorials [32,33,61], which discuss such problems as dueling bandit gradient descent [77] and the exploration vs. exploitation trade-off [38]. Lattimore and Szepesvári [52,Chapter 32] present a theoretical framework for using bandit algorithms for IR, and highlight unique challenges and ways to address them in the online setting.…”
Section: Scoring and Rankingmentioning
confidence: 99%
“…Ai et al [3] Unbiased learning to rank Arguello [5] Aggregated search Barocas and Hardt [8] Fairness in machine learning Bast et al [9] Semantic search, knowledge graphs Budylin et al [12,13] Online evaluation Burges [14] Learning to rank Cai and de Rijke [16] Query auto-completion Cambazoglu and Baeza-Yates [17] Infrastructure Chuklin et al [21,22,23,24] Click models Crestani et al [26] Mobile information retrieval Gao et al [30] Conversational search Glowacka [32] Bandit algorithms Grotov and de Rijke [33] Online learning to rank Hajian et al [35] Algorithmic bias Hofmann et al [40] Online evaluation Hui Yang and Zhang [41] Differential privacy in information retrieval Joachims and Swaminathan [44] Counterfactual evaluation and learning Jones [45] Mobile search Kanoulas [46] Online and offline evaluation Kelly [47] User studies Kenter et al [48] Neural methods in information retrieval Knijnenburg and Berkovsky [49] Privacy in recommender systems Lalmas [51] XML retrieval Lattimore and Szepesvári [52] Bandit algorithms Liu [55] Offline learning to rank Mehrotra et al [57] Task understanding Mitra and Craswell [58] Neural methods in information retrieval Onal et al [60] Neural methods in information retrieval Oosterhuis [61] Online evaluation and ranking Ren et al [66] E-commerce Sakai [67] Experimental design and methodology Santos et al [68] Diversification Silvestri…”
Section: Author(s) Topicmentioning
confidence: 99%
“…It is difficult to evaluate the effectiveness of online and reinforcement learning algorithms for information systems in a live setting with real users because it requires a very long time and a large amount of resources [30,31,51,58,63]. Thus, most studies in this area use purely simulated user interactions [31,51,58].…”
Section: Poissonmentioning
confidence: 99%
“…For example, they maintain that the user picks queries to express an intent according to a fixed probability distribution. It is known that the learning methods that are useful in a static setting do not deliver desired outcomes in a setting where all agents may modify their strategies [18,30]. Hence, one may not be able to use current techniques to help the DBMS understand the users' information need in a rather long-term interaction.…”
Section: Introductionmentioning
confidence: 99%
“…This property enables the experience replay update used in DQN. Third, we propose to apply a Dueling Bandit Gradient Descent (DBGD) method [16,17,49] for exploration, by choosing random item candidates in the neighborhood of the current recommender. This exploration strategy can avoid recommending totally unrelated items and hence maintain better recommendation accuracy.…”
Section: Introductionmentioning
confidence: 99%