Two-Stage Learning to Rank for Information Retrieval

Dang, Van; Bendersky, Michael; Croft, W. Bruce

doi:10.1007/978-3-642-36973-5_36

Cited by 54 publications

(37 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Independently of the algorithm or the loss function adopted, we observe that the cost for computing score S(q, d) is linear in the size n of the forest. As it is desirable to keep this cost as low as possible, either to comply with time budget constraints or to improve the overall effectiveness of query processing by ranking larger amount of candidate documents returned for a given query [4], we aim at reducing the complexity of a tree-based model by pruning trees. Specifically, given an input forest F providing the desired quality, CLEaVER produces a smaller forest Fp with at least the same effectiveness as F but with higher efficiency.…”

Section: Optimization Of Tree Ensemblesmentioning

confidence: 99%

Post-Learning Optimization of Tree Ensembles for Efficient Ranking

Lucchese

Nardini

Orlando

et al. 2016

Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Learning to Rank (LtR) is the machine learning method of choice for producing high quality document ranking functions from a ground-truth of training examples. In practice, efficiency and effectiveness are intertwined concepts and trading off effectiveness for meeting efficiency constraints typically existing in large-scale systems is one of the most urgent issues. In this paper we propose a new framework, named CLEaVER, for optimizing machine-learned ranking models based on ensembles of regression trees. The goal is to improve efficiency at document scoring time without affecting quality. Since the cost of an ensemble is linear in its size, CLEaVER first removes a subset of the trees in the ensemble, and then fine-tunes the weights of the remaining trees according to any given quality measure. Experiments conducted on two publicly available LtR datasets show that CLEaVER is able to prune up to 80% of the trees and provides an efficiency speed-up up to 2.6x without affecting the effectiveness of the model.

show abstract

Section: Optimization Of Tree Ensemblesmentioning

confidence: 99%

Post-Learning Optimization of Tree Ensembles for Efficient Ranking

Lucchese

Nardini

Orlando

et al. 2016

Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…In (Lai et al 2013), the authors presented a sparse learning-to-rank model for information retrieval. Dang et al (2013) proposed a two-stage learning-to-rank framework to address the problem of sub-optimal ranking when many relevant documents are excluded from the ranking list using bag-of-words retrieval models. In (Tan et al 2013), the authors proposed a model which directly optimizes the ranking measure without resorting to any upper bounds or approximations.…”

Section: Related Workmentioning

confidence: 99%

Supervised topic models with word order structure for document classification and retrieval learning

Jameel

Lam

2015

Inf Retrieval J

View full text Add to dashboard Cite

One limitation of most existing probabilistic latent topic models for document classification is that the topic model itself does not consider useful side-information, namely, class labels of documents. Topic models, which in turn consider the side-information, popularly known as supervised topic models, do not consider the word order structure in documents. One of the motivations behind considering the word order structure is to capture the semantic fabric of the document. We investigate a low-dimensional latent topic model for document classification. Class label information and word order structure are integrated into a supervised topic model enabling a more effective interaction among such information for solving document classification. We derive a collapsed Gibbs sampler for our model. Likewise, supervised topic models with word order structure have not been explored in document retrieval learning. We propose a novel supervised topic model for document retrieval learning which can be regarded as a pointwise model for tackling the learning-to-rank task. Available relevance assessments and word order structure are integrated into the topic model itself. We conduct extensive experiments on several publicly available benchmark datasets, and show that our model improves upon the state-of-the-art models.

show abstract

“…For our purposes, type(s 6 ) = visible, as we can substitute f 6 (s 2 , s 3 , p 7 ) in place of s 6 into the input of f 8 . Note that this can work from the other end as well: given that s 8 = f 8 (s 5 , s 6 , p 9 , p 10 ), given the substitution of s 6 and that type(s 8 ) = visible, we can then construct an inverse f −1 8 to recover what the value of s 5 is:…”

Section: Inferring Hidden Subscoresmentioning

confidence: 99%

“…In this work, we present a parameter tuning approach which can be used for either stage. This approach is especially impactful for the retrieval stage -while a large amount of work focuses on optimizing the parameters of the ranking stage, relatively little work [3,8] covers parameter tuning at the retrieval stage.…”

Section: Introductionmentioning

confidence: 99%

Parameter Tuning in Personal Search Systems

Chen

Wang

Qin

et al. 2020

Proceedings of the 13th International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

The effectiveness of information retrieval systems is heavily dependent on how various parameters are tuned. One option to find these parameters is to run multiple online experiments using a parameter sweep approach in order to optimize the search system. There are multiple downsides of this approach, including the fact that it may lead to a poor experience for users. Another option is to do offline evaluation, which can act as a safeguard against potential quality issues. Offline evaluation requires a validation set of data that can be benchmarked against different parameter settings. However, for search over personal corpora, e.g. email and file search, it is impractical and often impossible to get a complete representative validation set due to the inability to save raw queries and document information. In this work, we show how to do offline parameter tuning with only a partial validation set. In addition, we demonstrate how to do parameter tuning in situations when we have complete knowledge of the internal implementation of the search system (white-box tuning), as well as situations where we have only partial knowledge (grey-box tuning). The resulting method provides a way of performing offline parameter tuning in a privacy-preserving manner. We demonstrate the effectiveness of the proposed approach by reporting the results from search ranking experiments performed on two large-scale personal search systems. CCS CONCEPTS• Information systems → Evaluation of retrieval results.

show abstract

Two-Stage Learning to Rank for Information Retrieval

Cited by 54 publications

References 22 publications

Post-Learning Optimization of Tree Ensembles for Efficient Ranking

Post-Learning Optimization of Tree Ensembles for Efficient Ranking

Supervised topic models with word order structure for document classification and retrieval learning

Parameter Tuning in Personal Search Systems

Contact Info

Product

Resources

About