2015
DOI: 10.5120/21257-4109
|View full text |Cite
|
Sign up to set email alerts
|

A Review on Text Similarity Technique used in IR and its Application

Abstract: With large number of documents on the web, there is a increasing need to be able to retrieve the best relevant document. There are different techniques through which we can retrieve most relevant document from the large corpus. Similarity between words, sentences, paragraphs and documents is an important component in various tasks such as information retrieval, document clustering, word-sense disambiguation, automatic essay scoring, short answer grading, machine translation and text summarization. Text similar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(23 citation statements)
references
References 12 publications
0
22
0
Order By: Relevance
“…Which means the types that are processed in the proposed system are limited in two types: Words with length ≤5 have the possibility of one error only, whereas words with length >5 allow errors with two letters as the maximum probability. Then, to get the best system performance, the proposed system used the integrated number of similarity measures which Gives the best result in case of short string and it is fast and best suited for strings similarity (Pradhan, et al, 2015;Patel, 2016) In case of long string cost of Levenshtein distance is same as the length of string and considered it is not order of sequence of characters while comparing (Pradhan, et al, 2015;Patel, 2016) Longest common subsequence -Uses the recursion approach which uses stack that takes lots of space (Pradhan, et al, 2015) Jaro-Winkler Gives better result in case of hybrid method (Pradhan, et al, 2015) If the data size is too much large, then Jaro distance similarity not gives efficient results (Pradhan, et al, 2015) …”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Which means the types that are processed in the proposed system are limited in two types: Words with length ≤5 have the possibility of one error only, whereas words with length >5 allow errors with two letters as the maximum probability. Then, to get the best system performance, the proposed system used the integrated number of similarity measures which Gives the best result in case of short string and it is fast and best suited for strings similarity (Pradhan, et al, 2015;Patel, 2016) In case of long string cost of Levenshtein distance is same as the length of string and considered it is not order of sequence of characters while comparing (Pradhan, et al, 2015;Patel, 2016) Longest common subsequence -Uses the recursion approach which uses stack that takes lots of space (Pradhan, et al, 2015) Jaro-Winkler Gives better result in case of hybrid method (Pradhan, et al, 2015) If the data size is too much large, then Jaro distance similarity not gives efficient results (Pradhan, et al, 2015) …”
Section: Methodsmentioning
confidence: 99%
“…Similarity technique is high (Pradhan, et al, 2015) They are not suitable at multilingual environment, and the accuracy is very less (Pande, et al, 2013;Pradhan, et al, 2015) …”
Section: N-grammentioning
confidence: 99%
See 1 more Smart Citation
“…Text similarity is a well-studied problem in information retrieval (Pradhan et al (2015); Nagwani et al (2015)). Over the years, many techniques have been proposed to measure the distance/similarity of documents based on features such as word frequencies, word patterns in sentences, etc.…”
Section: Introductionmentioning
confidence: 99%
“…There are many surveys that review sentence similarity issue [10][11][12][13] . Unlike other surveys, this survey distinguishes between words similarity methods and sentences similarity methods.…”
Section: Introductionmentioning
confidence: 99%