Lecture Notes in Computer Science
DOI: 10.1007/978-3-540-75530-2_26
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Text Proximity Search

Abstract: In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
49
0

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 49 publications
(50 citation statements)
references
References 16 publications
1
49
0
Order By: Relevance
“…The basic segmentation is the one where each keyword is treated as a phrase [6], [7]. Each generated segmentation corresponds to a way of accessing the indexes to compute its answers.…”
Section: Valid Phrases In a Querymentioning
confidence: 99%
“…The basic segmentation is the one where each keyword is treated as a phrase [6], [7]. Each generated segmentation corresponds to a way of accessing the indexes to compute its answers.…”
Section: Valid Phrases In a Querymentioning
confidence: 99%
“…Schenkel et al [18] developed efficient topk query processing techniques for a proximity-aware IR model. They focused on a proximity-aware scoring function defined by a linear combination of a standard BM25-based score and a proximity score, and extended an existing top-k query processing technique [20] that was originally intended for a standard IR model such as TF-IDF and BM25.…”
Section: Related Workmentioning
confidence: 99%
“…They showed that their techniques speeded up evaluation considerably with an improved result quality. However, since the underlying top-k query processing technique intends for relatively short queries as does other existing top-k query processing techniques, those evaluated efficiently by their techniques in [18] are limited to relatively short queries.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…, T D } of D string documents of total length n, drawn from an alphabet Σ = [σ], and the query is a pattern P [1..p] over Σ. Muthukrishnan considered a family of problems called thresholded document listing: given an additional parameter K, list only the documents where some function score(P, d) of the occurrences of P in T d exceeded K. For example, the document mining problem aims to return the documents where P appears at least K times, whereas the repeats problem aims to return the documents where two occurrences of P appear at distance at most K. While document mining has obvious connections with typical term-frequency measures of relevance [6,1], the repeats problem is more connected to various problems in bioinformatics [4,10]. Also notice that the repeats problem is closely related to the term proximity based document retrieval in IR field [32,5,29,33,34]. Muthukrishnan achieved optimal time for both problems, with O(n) space (in words) if K is specified at indexing time and O(n log n) if specified at query time.…”
Section: Introductionmentioning
confidence: 99%