Plagiarism detection using document similarity based on distributed representation

Baba, Kensuke; Nakatoh, Tetsuya; Minami, Toshiro

doi:10.1016/j.procs.2017.06.038

Cited by 14 publications

(6 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Longest common subsequence (LCS) method: consists of finding the longest subsequence common to all sequences in a set of sequences. The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff utility and has applications in computational linguistics and bioinformatics [24]. Word Mover's Distance (WMD): uses word embeddings to calculate the similarities, and precisely, it uses normalized Bag-of-words and word Embeddings to calculate the distance between documents [25].…”

Section: Methodsmentioning

confidence: 99%

“…The similarity between vectors was computed by using cosine similarity. [24] The aim of this approach is evaluating the validity of using the distributed representation to define the word similarity. They introduce three methods based on the following three document similarities: for two documents: The length of the longest common subsequence (LCS) divided by the length of the shorter document, the local maximal value of the length of LCS, and the local maximal value of the weighted length of LCS.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A deep learning based technique for plagiarism detection: a comparative study

Mostafa¹,

Benabbou

2020

IJ-AI

View full text Add to dashboard Cite

<table width="0" border="1" cellspacing="0" cellpadding="0"><tbody><tr><td valign="top" width="593"><p>The ease of access to the various resources on the web-enabled the democratization of access to information but at the same time allowed the appearance of enormous plagiarism problems. Many techniques of plagiarism were identified in the literature, but the plagiarism of idea steels the foremost troublesome to detect, because it uses different text manipulation at the same time. Indeed, a few strategies have been proposed to perform semantic plagiarism detection, but they are still numerous challenges to overcome. Unlike the existing states of the art, the purpose of this study is to give an overview of different propositions for plagiarism detection based on the deep learning algorithms. The main goal of these approaches is to provide a high quality of worlds or sentences vector representation. In this paper, we propose a comparative study based on a set of criterions like: Vector representation method, Level Treatment, Similarity Method and Dataset. One result of this study is that most of researches are based on world granularity and use the word2vec method for word vector representation, which sometimes is not suitable to keep the meaning of the whole sentences. Each technique has strengths and weaknesses; however, none is quite mature for semantic plagiarism detection.</p></td></tr></tbody></table>

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

A deep learning based technique for plagiarism detection: a comparative study

Mostafa¹,

Benabbou

2020

IJ-AI

View full text Add to dashboard Cite

show abstract

“…The LCS algorithm has several benefits like [3] such more scalable, reduce computing power as well as checking the grammar. In the realm of image plagiarism detection [4] numerous methodologies have been proposed, including the scale-invariant feature transform (SIFT) algorithm. This algorithm aims to analyze and extract the primary elements of two images to ascertain their similarity.…”

Section: Literature Surveymentioning

confidence: 99%

A Comprehensive Survey of Advanced Image Processing and OCR Techniques for Enhanced Image Plagiarism Detection

Palvadi Srinivas Kumar

2024

jes

View full text Add to dashboard Cite

In today's digital world, sharing files, documents, and presentations online has become routine for both work and study purposes. However, this surge in online sharing has brought forth a significant challenge: plagiarism. Plagiarism will happen whenever the copied content in various sources by not giving proper credit. This paper delves into the realm of plagiarism detection, particularly focusing on images. We discuss various tools and techniques designed to detect plagiarism and compare their effectiveness based on factors like accuracy. Finally, we analyze the work of different authors and share common findings. Our research employs a thorough review process to ensure the accuracy of our conclusions. By emphasizing the use of AI-powered tools, our agenda is to promote sharing of original dataover the online domain.

show abstract

“…In December 2016 [15], Plagiarism detection method is being proposed by the author, which lies on the principle of local maximal value of the longest common subsequence (LCS) by its length and weight. They introduce three methods based on the following three document similarities: for two documents, • the length of LCS divided by the length of the shorter document, • the local maximal value of the length of LCS, and • the local maximal value of the weighted length of LCS.…”

Section: Related Workmentioning

confidence: 99%

“…Here in table 1 below is the complete comparison of available work for the plagiarism detection taken from different references such as [15], [16], [17], [18], [19],20], [21]. Table 1.…”

Section: Related Workmentioning

confidence: 99%

A Survey: Document Spamming Detection and Plagiarism Content Techniques

Ruthia¹,

Tiwary²

2018

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

Plagiarism is defined as representing someone else words, thoughts, knowledge, methods, programs etc in our own name. Plagiarism has a wider meaning by paraphrasing someone else text by replacing some data or method in our way is also a plagiarism. It also violates the rule if you don"t mention someone author name when you are copying their data in your own way of representation. The detection techniques are applied by differentiating between variety of languages such as natural and programming language. From the existence of the previous approaches like plagiarism detection technique and SCAM analysis from the document stream. A further solution to find plagiarism in the given input data, and textual data is required. Here we are using QAP based Function minimization approach. This technique can be used to find document from the stream. Further QAP could enhance the required minimization function.

show abstract

Plagiarism detection using document similarity based on distributed representation

Cited by 14 publications

References 5 publications

A deep learning based technique for plagiarism detection: a comparative study

A deep learning based technique for plagiarism detection: a comparative study

A Comprehensive Survey of Advanced Image Processing and OCR Techniques for Enhanced Image Plagiarism Detection

A Survey: Document Spamming Detection and Plagiarism Content Techniques

Contact Info

Product

Resources

About