2006
DOI: 10.1007/11687238_21
|View full text |Cite
|
Sign up to set email alerts
|

Indexing Shared Content in Information Retrieval Systems

Abstract: Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separately, causing shared content to be indexed multiple times. In this paper, we describe a new document representation model where related documents are organized as a tree, allowing shared content to be indexed just once. We show how this representation model can be encoded in an inverted index and we describe algorithms for evaluating… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
46
0
1

Year Published

2008
2008
2017
2017

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 33 publications
(47 citation statements)
references
References 20 publications
0
46
0
1
Order By: Relevance
“…Indexing versioned document collections has been studied in [7,25,14,13]. Broder et al [7] propose a technique that exploits large content overlaps between documents to achieve a reduction in index size.…”
Section: Indexing Versioned Document Collectionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Indexing versioned document collections has been studied in [7,25,14,13]. Broder et al [7] propose a technique that exploits large content overlaps between documents to achieve a reduction in index size.…”
Section: Indexing Versioned Document Collectionsmentioning
confidence: 99%
“…Broder et al [7] propose a technique that exploits large content overlaps between documents to achieve a reduction in index size. Each version is partitioned into a set of fragments, e.g., an email is partitioned into two fragments, subject and body.…”
Section: Indexing Versioned Document Collectionsmentioning
confidence: 99%
“…The so-called dictionary of the inverted index can be organized in different ways in order to meet the required types of queries and specification of data. In our research we will be focused on search trees [2,4,5,7].…”
Section: Introductionmentioning
confidence: 99%
“…Both inverted indexes for word and phrase queries over natural language texts [2,5,11,12], and other indexes for general string collections [16,6,14,7], have been pursued.…”
Section: Introductionmentioning
confidence: 99%