2000
DOI: 10.1006/jcss.2000.1711
|View full text |Cite
|
Sign up to set email alerts
|

Latent Semantic Indexing: A Probabilistic Analysis

Abstract: Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
524
0
9

Year Published

2000
2000
2021
2021

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 613 publications
(536 citation statements)
references
References 17 publications
3
524
0
9
Order By: Relevance
“…Term-based search engines face both classical problems in information retrieval, as well as problems specific to the World Wide Web setting, when handling broad-topic queries. The classic problems include the following issues [4,20]. ž Synonymy -retrieving documents containing the term 'car' when given the query 'automobile'.…”
Section: Introductionmentioning
confidence: 99%
“…Term-based search engines face both classical problems in information retrieval, as well as problems specific to the World Wide Web setting, when handling broad-topic queries. The classic problems include the following issues [4,20]. ž Synonymy -retrieving documents containing the term 'car' when given the query 'automobile'.…”
Section: Introductionmentioning
confidence: 99%
“…In [88], a model for documents is developed on which the LSI method is evaluated. In this model, each document is built out of a number of different topics (hidden from the retrieval algorithm).…”
Section: Evaluating the Results Of Data Miningmentioning
confidence: 99%
“…Because the probability of multiple roots is so small when k ≥ 8 while in the large-scale applications we expect k 10, we suggest not to worry about multiple roots. Also, we will only use the O(k −1 ) term of Var(â MLE ), i.e, (7). Figure 3 presents some simulation results, using two words "THIS" and "HAVE," from some MSN Web crawl data.…”
Section: This Probability Is (Crudely) Bounded Bymentioning
confidence: 99%
“…Random projections [1] have been used in machine learning [2][3][4][5][6] and many other applications in data mining and information retrieval, e.g., [7][8][9][10][11][12].…”
Section: Introductionmentioning
confidence: 99%