2017
DOI: 10.5120/ijca2017913699
|View full text |Cite
|
Sign up to set email alerts
|

Information Retrieval using Cosine and Jaccard Similarity Measures in Vector Space Model

Abstract: With the exponential growth of documents available to us on the web, the requirement for an effective technique to retrieve the most relevant document matching a given search query has become critical. The field of Information Retrieval deals with the problem of document similarity to retrieve desired information from a large amount of data. Various models and similarity measures have been proposed to determine the extent of similarity between two objects. The objective of this paper is to summarize the entire… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(13 citation statements)
references
References 9 publications
0
12
0
1
Order By: Relevance
“…First, we use Jaccard similarity as another measure of content similarity. Jaccard similarity is another basic and effective measure of the similarity between documents (Jain et al ., 2017). It measures the proportion of common words to unique words in two documents.…”
Section: Resultsmentioning
confidence: 99%
“…First, we use Jaccard similarity as another measure of content similarity. Jaccard similarity is another basic and effective measure of the similarity between documents (Jain et al ., 2017). It measures the proportion of common words to unique words in two documents.…”
Section: Resultsmentioning
confidence: 99%
“…There were some limitations in this stage, such as the summarization made in Wikipedia's references, not-mentioning the full name of the journals and the one-word name of some journals. A similarity-finding approach was applied to determine the similarity between the references and journal titles aimed at overcoming these limitations.The Jaccard index was used to determine the similarity between the scientific references of the Persian Wikipedia and the titles of peer-reviewed journals because of its higher accuracy in this study and its frequent usage in text similarity-finding studies with similarity rates of 70% (Jain et al ., 2017; Niwattanakul et al ., 2013). Finally, the pages that used reputable PSPs in their references were identified.…”
Section: Methodsmentioning
confidence: 99%
“…The configuration used for the LSTM model included For similarity metrics between groups of terms, we con-414 sidered two different approaches: (i) Jaccard similarity as 415 a baseline [65], [66], and (ii) our own sophisticated metric 416 based on lexical and semantic proximity [67]. The cosine 417 distance was discarded provided that the terms in the de-418 scriptions had no logical ordering.…”
Section: A Experimental Data-setmentioning
confidence: 99%