With the exponential growth of documents available to us on the web, the requirement for an effective technique to retrieve the most relevant document matching a given search query has become critical. The field of Information Retrieval deals with the problem of document similarity to retrieve desired information from a large amount of data. Various models and similarity measures have been proposed to determine the extent of similarity between two objects. The objective of this paper is to summarize the entire process, looking into some of the most well-known algorithms and approaches to match a query text against a set of indexed documents.
Data mining is to discover and assess significant patterns from data, followed by the validation of these identified patterns. Data mining is the process to evaluate the data from different perceptions and summarizing it into valuable information. This summarized information consequently can be used to design business strategies to upsurge revenue, occasionally drive down costs, or both. The Apriori association algorithm is based on pre-computed frequent item sets and it has to scan the entire transaction log / dataset or database which will become a problem with large item sets. With FP trees, there is no necessity for candidate generation, unlike in the Apriori algorithm, and the frequently occurring item sets are discovered by just traversing the FP tree. This paper discusses the FP Tree concept and implements it using Java for a general social survey dataset. We use this approach to determine association rules that occur in the dataset. In this manner, we can establish relevant rules and patterns in any set of records.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.