Latent semantic indexing and large dataset: Study of term-weighting schemes

Zaman, A. N. K.; Brown, Charles Grant

doi:10.1109/icdim.2010.5664669

Cited by 9 publications

(6 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Three different term weighting schemes and own list of stop words have used to judge the performance. Recall, Precision and Coefficient of Variation were used to evaluate the retrieval performance of LSI based retrieval system [17]. Khaled M. Hammouda explored the concept of document indexing technique with more informative features including phrases and their weights that are important for indexing.…”

Section: Related Workmentioning

confidence: 99%

An Intelligent Approach to Information Retrieval System Using Enhanced DIG and FP-Tree Techniques

Janarthanan¹,

Rajkumar²

2014

IOSRJCE

View full text Add to dashboard Cite

Information retrieval is the process of retrieving all the relevant documents that satisfies the user query from large corpora. It is aimed to provide the relevant information and documents that matches the user query. Outcome of the several research results confirms that difficulties in information retrieval are matching the query with corpus. Consequently, the enhanced indexing technique named Document Index Graph (DIG) used for indexing document collection in order to match and retrieve information efficiently. Hence, an enhanced DIG has been constructed that stores all the stemmed sentences of documents in the graph. The words with same stem can be stored only once in DIG. This helps to reduce the size of the graph. The most frequently appearing words are planted into FP (Frequent Pattern) Tree. The FP-tree is a compact representation of all relevant frequently occurring information in a corpus. The enhanced FP tree with a table generates all types of possible term set which satisfy the minimum support. Information is retrieved with the help of FP-Tree and Document Index Graph.

show abstract

Section: Related Workmentioning

confidence: 99%

An Intelligent Approach to Information Retrieval System Using Enhanced DIG and FP-Tree Techniques

Janarthanan¹,

Rajkumar²

2014

IOSRJCE

View full text Add to dashboard Cite

show abstract

“…However, traditional log entropy weighting takes an insufficient approach and fails to account for local and global attributes of words present within a corpus [11]- [13]. To address these limitations of log entropy weighting approaches such as log entropy weighting in this research, we developed TWLE as an improved weighting alternative that considers both local and global attributes of words [14], [15]. Local weights help capture context significance through evaluation of document frequency while global weights indicate importance within topic formation by considering corpus wide frequency [16], [17].…”

Section: Introductionmentioning

confidence: 99%

Improving the term weighting log entropy of latent dirichlet allocation

Muhajir,

Rosadi,

Danardono

2024

IJEECS

View full text Add to dashboard Cite

<p class="AbstractText">The process of analyzing textual data involves the utilization of topic modeling techniques to uncover latent subjects within documents. The presence of numerous short texts in the Indonesian language poses additional challenges in the field of topic modeling. This study presents a substantial enhancement to the term weighting log entropy (TWLE) approach within the latent dirichlet allocation (LDA) framework, specifically tailored for topic modeling of Indonesian short texts. This work places significant emphasis on the utilization of LDA for word weighting. The research endeavor aimed to enhance the coherence and interpretability of an Indonesian topic model through the integration of local and global weights. Local Weight focuses on the distinct characteristics of each document, whereas global weight examines the broader perspective of the entire corpus of documents. The objective was to enhance the effectiveness of LDA themes by this amalgamation. The TWLE model of LDA was found to be more informative and effective than the TF-IDF LDA when compared with short Indonesian text. This work improves topic modeling in brief Indonesian compositions. Transfer learning for NLP and Indonesian language adaptation helps improve subject analysis knowledge and precision, this could boost NLP and topic modeling in Indonesian.</p>

show abstract

“…They are not considered influential during the execution of LSI process to retrieve relevant documents. It also reduces the size of the indexing structure considerably [9].…”

Section: Stop Word Removalmentioning

confidence: 99%

“…[9]. After building the term-document matrix Normalize it using where A is the TDM and n is the total no of words in document j.…”

Section: Term-document Matrixmentioning

confidence: 99%

Framework for Document Retrieval using Latent Semantic Indexing

Phadnis¹,

Gadge²

2014

IJCA

View full text Add to dashboard Cite

Today, with the rapid development of the Internet, textual information is growing rapidly. So document retrieval which aims to find and organize relevant information in text collections is needed. With the availability of large scale inexpensive storage the amount of information stored by organizations will increase. Searching for information and deriving useful facts will become more cumbersome. How to extract a lot of information quickly and effectively has become the focus of current research and hot topics.The state of the art for traditional IR techniques is to find relevant documents depending on matching words in users' query with individual words in text collections. The problem with Content-based retrieval systems is that documents relevant to a users' query are not retrieved, and many unrelated or irrelevant materials are retrieved. In this paper information retrieval method is proposed based on LSI approach. Latent Semantic Indexing (LSI) model is a concept based retrieval method that exploits the idea of vector space model and singular value decomposition. The goal of this research is to evaluate the applicability of LSI technique for textual document search and retrieval.

show abstract

Latent semantic indexing and large dataset: Study of term-weighting schemes

Cited by 9 publications

References 7 publications

An Intelligent Approach to Information Retrieval System Using Enhanced DIG and FP-Tree Techniques

An Intelligent Approach to Information Retrieval System Using Enhanced DIG and FP-Tree Techniques

Improving the term weighting log entropy of latent dirichlet allocation

Framework for Document Retrieval using Latent Semantic Indexing

Contact Info

Product

Resources

About