Essential Dimensions of Latent Semantic Indexing (LSI)

Indonesia is a country of law. As law states, Indonesian have regulations that govern the relationship between the communities, one of them is criminal law. Set of rules of criminal law is written in the Kitab Undang-undang Hukum Pidana (KUHP), which contains hundreds of clause which regulate the relationship between the community based on values, norms, and specific rules that focuses on the interests of the public. In this paper, information retrieval used to search the clause of the KUHP based on a description of the crime, using Latent Semantic Indexing (LSI). LSI adopts techniques in mathematical dimension reduction process Singular Value Decomposition (SVD). This system use 60 clause as training data, and 6 query or crime description as test data. In each of the data clause of the KUHP contained data such as clause number, clause, and the clause contents. The system will calculate and determine the relevant clause is based on query or description of the crimes that has been entered. Cosine similarity used to calculate the similarity or proximity clause KUHP with query. The performance of the system is shown by the test results of Mean Average Precision (MAP) value at each k-rank is 5, 10, 20, 30, 40, 50, and 59, with the highest performance is in k-rank 40 with MAP 0.8944.

show abstract

“…Prosesnya dengan menghitung kemiripan dua buah vektor, yaitu antara vektor dari corpus dan vektor dari query (Kontostathis 2007 …”

Section: Vector Space Model (Vsm)unclassified

Pencarian Pasal Pada Kitab Undang-Undang Hukum Pidana (Kuhp) Berdasarkan Kasus Menggunakan Metode Cosine Similarity Dan Latent Semantic Indexing (Lsi)

Baskoro

Ridok

Furqon

2015

JEEST

View full text Add to dashboard Cite

show abstract

“…The query intersects the TDM at the first and last documents, and as mentioned before the TDM is diagonal and therefore the S matrix is also diagonal, and so removing any elements from the diagonal values results in removing the same diagonal values in the original TDM. Thus, when the SVD is applied at the range of k−values (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14), only the first document is returned, because at this range of values the last document, which is document number 15, is ignored. It is important to note that, applying the LSI at k − value 15, which means no dimension reduction occurs, returns the two documents, the first and the last documents in the TDM, which have a cosine value of 0.7071.…”

Section: Ii-c the Tdm Diagonalmentioning

confidence: 99%

“…Choosing an optimal dimensionality reduction parameter (k − value) is very important and remains elusive. Traditionally, the optimal k −value has been chosen by running a set of queries with known relevant document sets for multiple values of k [5]. The k − value that returns the best results is chosen as the optimal k−value for each collection.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Textual noise analysis and removal for effective search engines

Jaber

Amira

Milligan

2010

2010 2nd European Workshop on Visual Information Processing (EUVIP)

View full text Add to dashboard Cite

In the field of intelligent information retrieval (IR), latent semantic indexing (LSI) is a popular technique used to retrieve information related more in meaning than in lexical matching. A core component in the process is the use of the singular value decomposition (SVD) which is used to remove the lexical noise in the term document matrix (TDM). The topic of mathematical modelling for noise reduction in LSI is important and demands attention. In this paper some observations on aspects of this topic are introduced. The work addresses a definition for noise in text processing and seeks to determine the best structure for the TDM. In other words, the structure of the TDM that would facilitate efficient searching within the LSI.

show abstract

“…For example, in [6], Hoenkamp shows how the technique underlying LSI is just one example of a unitary operator. And the use of the Haar wavelet transform (HWT), as an alternative that shares this unitary property at much reduced computational cost, has been suggested.One of the most recent works has emphasized on dimension reduction in the LSI system [7]. Other researchers have used LSI in field of image IR [8] [9].…”

Section: Tareq Jabermentioning

confidence: 99%