2019
DOI: 10.1007/978-3-030-15712-8_25
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Ranking Models in an Online Setting

Abstract: Online Learning to Rank (OLTR) methods optimize ranking models by directly interacting with users, which allows them to be very efficient and responsive. All OLTR methods introduced during the past decade have extended on the original OLTR method: Dueling Bandit Gradient Descent (DBGD). Recently, a fundamentally different approach was introduced with the Pairwise Differentiable Gradient Descent (PDGD) algorithm. To date the only comparisons of the two approaches are limited to simulations with cascading click … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
1

Relationship

4
3

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 35 publications
0
10
0
Order By: Relevance
“…The gradient estimation of PDGD is unbiased with respect to user document pair preferences [31]. PDGD is empirically found to be significantly better than DBGD in terms of final convergence, learning speed and user experience during optimization, making PDGD the current state-of-the-art method for OLTR [20,64,32,52]. PDGD has also been adapted to the federated OLTR context [51], exhibiting again state-of-the-art performance.…”
Section: Online Learning To Rankmentioning
confidence: 99%
“…The gradient estimation of PDGD is unbiased with respect to user document pair preferences [31]. PDGD is empirically found to be significantly better than DBGD in terms of final convergence, learning speed and user experience during optimization, making PDGD the current state-of-the-art method for OLTR [20,64,32,52]. PDGD has also been adapted to the federated OLTR context [51], exhibiting again state-of-the-art performance.…”
Section: Online Learning To Rankmentioning
confidence: 99%
“…The earliest method, Dueling Bandit Gradient Descent (DBGD), samples variations of a ranking model and compares them using online evaluation [10]; if an improvement is recognized the model is updated accordingly. Most online LTR methods have increased the data-efficiency of DBGD [9,24,26]; later work found that DBGD is not effective at optimizing neural models [17] and often fails to find the optimal linear-model even in ideal scenarios [18]. To these limitations, alternative approaches for online LTR have been proposed.…”
Section: Related Workmentioning
confidence: 99%
“…Our experimental setup is semi-synthetic: queries, relevance judgements, and documents come from industry datasets, while biased and noisy user interactions are simulated using probabilistic user models. This setup is very common in the counterfactual and online LTR literature [1,15,24]. We make use of the three largest LTR industry datasets: Yahoo!…”
Section: The Semi-synthetic Setupmentioning
confidence: 99%