A framework for understanding Latent Semantic Indexing (LSI) performance

Kontostathis, April; Pottenger, William M.

doi:10.1016/j.ipm.2004.11.007

Cited by 148 publications

(90 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In LSA, researchers may use more than one weighting scheme, and may subject the results calculated to further factor analysis or clustering. We refer the reader to Kontostathis & Pottenger (2006) In conclusion, our work illustrates in a 'proof-ofconcept' approach how two modern computational analyses can be used, in isolation as well as in complementary fashion, to aid the content analysis of large corpi of text. We show how the results of one analysis (LSA) can be used to inform our understanding of the trends between separate data sets, and we demonstrate how a text mining analysis (using Leximancer) can be used to provide further insights for the underlying rationale of the outcomes of the LSA analysis.…”

Section: Discussionmentioning

confidence: 99%

Quantitative approaches to content analysis: identifying conceptual drift across publication outlets

Indulska

Hovorka²,

Recker

2012

European Journal of Information Systems

110

View full text Add to dashboard Cite

Unstructured text data, such as emails, blogs, contracts, academic publications, organizational documents, transcribed interviews, and even tweets, are important sources of data in Information Systems research. Various forms of qualitative analysis of the content of these data exist and have revealed important insights. Yet, to date, these analyses have been hampered by limitations of human coding of large data sets, and by bias due to human interpretation. In this paper, we compare and combine two quantitative analysis techniques to demonstrate the capabilities of computational analysis for content analysis of unstructured text. Specifically, we seek to demonstrate how two quantitative analytic methods, viz., Latent Semantic Analysis and data mining, can aid researchers in revealing core content topic areas in large (or small) data sets, and in visualizing how these concepts evolve, migrate, converge or diverge over time. We exemplify the complementary application of these techniques through an examination of a 25-year sample of abstracts from selected journals in Information Systems, Management, and Accounting disciplines. Through this work, we explore the capabilities of two computational techniques, and show how these techniques can be used to gather insights from a large corpus of unstructured text.

show abstract

Section: Discussionmentioning

confidence: 99%

Quantitative approaches to content analysis: identifying conceptual drift across publication outlets

Indulska

Hovorka²,

Recker

2012

European Journal of Information Systems

110

View full text Add to dashboard Cite

show abstract

“…We determine the sparsity pattern of M 1 column by column. First, find the largest entry in each column of A, suppose they are a 31 , a 12 …”

Section: Multistep Approachmentioning

confidence: 99%

“…8 and it shows that our MSSP is not comparable with CPM. Compared with the methods used in [13,12,14,3], the size of NPL preprocessed in our way is much larger and the data matrix has the feature of m < n, i.e. the document's number is more than the term's number.…”

Section: Numerical Experimentsmentioning

confidence: 99%

Multistep Sparse Approximation Technology in Information Retrieval

Shen¹,

Li²,

Unuakhalu³

2014

IJCA

View full text Add to dashboard Cite

With large sets of text documents increasing rapidly, being able to efficiently utilize this vast volume of new information and service resource presents challenges to computational scientists. Text documents are usually modeled as a term-document matrix which has high dimensional and space vectors. To reduce the high dimensions, one of the various dimensionality reduction methods, concept decomposition, has been developed by some researchers. This method is based on document clustering techniques and leastsquare matrix approximation to approximate the matrix of vectors. However the numerical computation is expensive, as an inverse of a dense matrix formed by the concept vector matrix is required. In this paper we presented a class of multistep spare matrix strategies for concept decomposition matrix approximation. In this approach, a series of simple sparse matrices are used to approximate the decompositions. Our numerical experiments on both small and large datasets show the advantage of such an approach in terms of storage costs and query time compared with the least-squares based approach while maintaining comparable retrieval quality.

show abstract

“…where, T is a left singular vector representing a term by dimension matrix, S is a singular value dimension by dimension matrix and D is a right singular vector representing document by document matrix [31]. The decomposed matrices are then truncated to a dimension less than the original k-value and the original X matrix approximated in the reduced latent space which better represents semantic relationships between terms compared to the original k-dimension document space.…”

Section: Singular Value Decompositionmentioning

confidence: 99%