Automatic query expansion has been known to be the most important method in overcoming the word mismatch problem in information retrieval. Thesauri have long been used by many researchers as a tool for query expansion. However only one type of thesaurus has generally been used. In this paper we analyze the characteristics of dierent thesaurus types and propose a method to combine them for query expansion. Experiments using the TREC collection proved the eectiveness of our method over those using one type of thesaurus.
Information from computer programs can be extracted from its source code, external documentation, and compiled code. Although compiled code is an assured information source which is always exists in published computer programs, it is seldom used by the existing search engines since some reverse engineering tasks are needed. In this research, a search engine for Java archives that uses byte code (compiled code for Java Archive) as its information source is developed. It enables user to search within a collection of Java Archives without relying with source code and external documentation. Compared with Penta and FindJar [2][7], A novel term extraction process beyond the file and class name is proposed, which includes field name, method name, string literal used in program, program flow weighting, and method expansion. Exclusive tokenization, stopping, and stemming are also implemented to improve effectiveness. Based on evaluation, it has a fairly good effectiveness although it may vary based on terms stored on index. Its effectiveness is higher than FindJar main features reimplementation which indicates that detailed compiled code has positive influences in computer programs search engine. Efficiency depends on how many terms stored on index and how many process used at certain step.
Harga minyak kelapa sawit bisa mengalami kenaikan, penurunan maupun tetap setiap hari karena faktor yang mempengaruhi harga minyak kelapa sawit seperti harga minyak nabati lain (minyak kedelai dan minyak canola), harga minyak mentah dunia, maupun nilai tukar riil antara kurs dolar terhadap mata uang negara produsen (rupiah, ringgit, dan canada) atau mata uang negara konsumen (rupee). Untuk itu dibutuhkan prediksi harga minyak kelapa sawit yang cukup akurat agar para investor bisa mendapatkan keuntungan sesuai perencanaan yang dibuat. tujuan dari penelitian ini yaitu untuk mengetahui perbandingan accuracy, precision, dan recall yang dihasilkan oleh algoritma Naïve Bayes, Support Vector Machine, dan K-Nearest Neighbor dalam menyelesaikan masalah prediksi harga minyak kelapa sawit dalam investasi. Berdasarkan hasil pengujian dalam penelitian yang telah dilakukan, algoritma Support Vector Machine memiliki accuracy, precision, dan recall dengan jumlah paling tinggi dibandingkan dengan algoritma Naïve Bayes dan algoritma K-Nearest Neighbor. Nilai accuracy tertinggi pada penelitian ini yaitu 82,46% dengan precision tertinggi yaitu 86% dan recall tertinggi yaitu 89,06%.
This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget's thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting method considering all query terms and all of the thesauri. Experimental results show that our method enhances information retrieval performance significantly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.