As the technology is developing information in each fields like literature, technology, science, medicine etc., also increasing in high pace. To extract related document in huge collection of documents based on user query in digital world is an interesting problem. Documents similarity
Technique used in many applications like text categorization, plagiarism discernment, document clustering, information retrieval, machine translation and question answering system. Many algorithms have been developed for this purpose that take a document or input query and match it with the
document databases. This paper proposes novel approach to vectorize each document and query with normalized TF-IDF method and applying Cosine Similarity function to extract top 3 documents based on user query.
In the world of internet, searching play a vital role to retrieve the relevant answers for the user specific queries. The most promising application of natural language processing and information retrieval system is Question answering system which provides directly the accurate answer
instead of set of documents. The main objective of information retrieval is to retrieve relevant document from a huge volume of data sets underlying in the internet using appropriatemodel. There are many models proposed for retrieval process such as Boolean, Vector space and Probabilistic
method. Vector space model is best method in information retrieval for document ranking with efficient document representation which combines simplicity and clarity. VSM adopts similarity function to measure the matching between documents and user intent, and assign scores from the biggest
to smallest. The documents and query are assigned with weights using term frequency and inverse document frequency method. To retrieve most relevant document to the user query term, document ranking function cosine similarity score is applied for every document and user query. The documents
having more similarity scores will be considered as relevant documents to the query term and they are ranked based on these scores. This paper emphasizes on different techniques of information retrieval and Vector Space Model offers a realistic compromise in IR processing. It allows best weighing
scheme which ranks the set of documents in order of relevance based on user query.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.