Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007
DOI: 10.1145/1277741.1277904
|View full text |Cite
|
Sign up to set email alerts
|

Boosting static pruning of inverted files

Abstract: This paper revisits the static term-based pruning technique presented in [2] for ad-hoc retrieval, addressing different issues concerning its algorithmic design not yet taken into account. Although the original technique is able to retain precision when a considerable part of the inverted file is removed, we show that it is possible to improve precision in some scenarios if some key design features are properly selected.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
24
0

Year Published

2009
2009
2011
2011

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 16 publications
(26 citation statements)
references
References 1 publication
2
24
0
Order By: Relevance
“…For the pruned index files, the element length, i.e., number of terms in an element, reduces after pruning. In earlier studies [1,3], it is reported that using the updated element lengths results better in terms of effectiveness. We observed the same situation also for XML retrieval case, and thus, use the updated element lengths for each pruning level of TCP and DCP.…”
Section: Performance Comparison Of Indexing Strategies: Focused Taskmentioning
confidence: 96%
See 2 more Smart Citations
“…For the pruned index files, the element length, i.e., number of terms in an element, reduces after pruning. In earlier studies [1,3], it is reported that using the updated element lengths results better in terms of effectiveness. We observed the same situation also for XML retrieval case, and thus, use the updated element lengths for each pruning level of TCP and DCP.…”
Section: Performance Comparison Of Indexing Strategies: Focused Taskmentioning
confidence: 96%
“…Next, the k th highest score, z t , is determined and all postings that have scores less than z t * ε are removed, where ε is a user defined parameter to govern the pruning level. Following the practice in [3], we disregard any theoretical guarantees and determine ε values according to the desired pruning level.…”
Section: Pruning the Element-index For Xml Retrievalmentioning
confidence: 99%
See 1 more Smart Citation
“…In a nutshell, TCP scores (using the Smart's TFIDF function) and sorts the postings of each term in the collection and removes the tail of the list according to some decision criteria. In [1], instead of the TFIDF function, BM25 is employed during the pruning and retrieval stages. In that study, it's shown that by tuning the pruning algorithm according to the score function, it is possible to further boost the performance.…”
Section: Static Inverted Index Pruningmentioning
confidence: 99%
“…Thus, it is hard to infer how these two approaches, namely, TCP and DCP, compare to each other. Furthermore, given the evidence of recent work on how tuning the scoring function boosts the performance [1], it is important to investigate the robustness of these methods for different scoring functions that are employed during the pruning and retrieval, i.e., query execution.…”
Section: Static Inverted Index Pruningmentioning
confidence: 99%