Improving the retrieval of information from external sources

Dumais, Susan T.

doi:10.3758/bf03203370

Cited by 414 publications

(299 citation statements)

References 19 publications

Supporting

Mentioning

286

Contrasting

Unclassified

Order By: Relevance

“…A brief overview ofLSA will be provided here. More complete descriptions of LSA may be found in Deerwester, Dumais, Furnas, Landauer, & Harshman (1990) and Dumais (1990).…”

mentioning

confidence: 99%

Latent semantic analysis for text-based research

Foltz

1996

Behavior Research Methods, Instruments, & Computers

199

View full text Add to dashboard Cite

Latent semantic analysis (LSA) is a statistical model of word usage that permits comparisons of semantic similarity between pieces of textual information. This paper summarizes three experiments that illustrate how LSA may be used in text-based research. Two experiments describe methods for analyzinga subject's essay for determining from what text a subject learned the information and for grading the quality of information cited in the essay. The third experiment describes using LSAto measure the coherence and comprehensibility of texts.One of the primary goals in text-comprehension research is to understand what factors influence a reader's ability to extract and retain information from textual material. The typical approach in text-comprehension research is to have subjects read textual material and then have them produce some form of summary, such as answering questions or writing an essay. This summary permits the experimenter to determine what information the subject has gained from the text.To analyze what a subject has learned from a text, the task of the experimenter is to relate what was in the summary to what the subject has read. This permits the subject's representation (cognitive model) of the text to be compared with the representation expressed in the original text. For such an analysis, the experimenter must examine each sentence in the subject's summary and match the information contained in the sentence to the information contained in the texts that were read. Information in the summary that is highly related to information from the texts would indicate that it was likely learned from the text. Nevertheless, matching this information is not easy. It requires scanning through the original texts to locate the information. In addition, since subjects do not write exactly the same words as those that they have read, it is not possible to look for exact matches. Instead, the experimenter must make the match on the basis of the semantic content of the text.. This work has benefited from collaborative research with

show abstract

“…A brief overview ofLSA will be provided here. More complete descriptions of LSA may be found in Deerwester, Dumais, Furnas, Landauer, & Harshman (1990) and Dumais (1990).…”

mentioning

confidence: 99%

Latent semantic analysis for text-based research

Foltz

1996

Behavior Research Methods, Instruments, & Computers

199

View full text Add to dashboard Cite

show abstract

“…• binary weighting has been used in (Apté, Damerau et al 1994 • ITC (Buckley, Salton et al 1995) • and entropy weighting (Dumais 1991).…”

Section: W(dt I ) Expresses the Importance Of Term T I In Document Dmentioning

confidence: 99%

Text and Hypertext Categorization

Benbrahim

Bramer

2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Automatic categorization of text documents has become an important area of research in the last two decades, with features that make it significantly more difficult than the traditional classification tasks studied in machine learning. A more recent development is the need to classify hypertext documents, most notably web pages. These have features that add further complexity to the categorization task but also offer the possibility of using information that is not available in standard text classification, such as metadata and the content of the web pages that point to and are pointed at by a web page of interest. This chapter surveys the state of the art in text categorization and hypertext categorization, focussing particularly on issues of representation that differentiate them from 'conventional' classification tasks and from each other.

show abstract

“…This system is called Latent Semantic Indexing (LSI) [Dum91] and was the product of Susan Dumais, then at Bell Labs. LSI simply creates a low rank approximation A k to the term-by-document matrix A from the vector space model.…”

Section: Latent Semantic Indexingmentioning

confidence: 99%

“…6. [Dum91], [BB05], [BR99], [Ber01], [BDJ99] LSI is known to outperform the vector space model in terms of precision and recall. 7.…”

Section: [Mey00] If the Term-by-document Matrix A M×n Has The Singulamentioning

confidence: 99%