2021
DOI: 10.1186/s41239-021-00277-8
|View full text |Cite
|
Sign up to set email alerts
|

Paraphrase type identification for plagiarism detection using contexts and word embeddings

Abstract: Paraphrase types have been proposed by researchers as the paraphrasing mechanisms underlying acts of plagiarism. Synonymous substitution, word reordering and insertion/deletion have been identified as some of the common paraphrasing strategies used by plagiarists. However, similarity reports generated by most plagiarism detection systems provide a similarity score and produce matching sections of text with their possible sources. In this research we propose methods to identify two important paraphrase types – … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(13 citation statements)
references
References 43 publications
(70 reference statements)
0
8
0
Order By: Relevance
“…Dey et al [9] applied a Support Vector Machine (SVM) classifier to identify semantically similar tweets and other short texts. A very recent work studied word embedding models for paraphrase sentence pairs with word reordering and synonym substitution [1]. In this work, we focus on detecting paraphrases without access to pairs as it represents a realistic scenario without pair information.…”
Section: Related Workmentioning
confidence: 99%
“…Dey et al [9] applied a Support Vector Machine (SVM) classifier to identify semantically similar tweets and other short texts. A very recent work studied word embedding models for paraphrase sentence pairs with word reordering and synonym substitution [1]. In this work, we focus on detecting paraphrases without access to pairs as it represents a realistic scenario without pair information.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, Corbeil and Ghadivel have used three BERT models and XLNET for uninterrupted paraphrasing detection to the MSRP corpus, achieving 85.8%-91.5% of F1 results (Corbeil & Ghadivel, 2020). Alvi et al used CS corpus and Con-ceptNet Numberbatch pre-trained word embeddings (Alvi et al, 2021). They reported an F1 score of 90.6% for identifying word reorderings and an F1 score of 80.2% for identifying synonymous substitutions for the entire dataset.…”
Section: Related Workmentioning
confidence: 99%
“…It helps to determine the cosine angle. When the result is bound to [0, 1], cosine similarity is particularly effective [19]. The cosine similarity of the two vectors in the same orientation is 1, and the relative 90 orientation is 0.…”
Section: Cosine Similaritymentioning
confidence: 99%
“…They used PAN-PC-11 and PAN-14 datasets for training and testing purposes, respectively. Alvi et al [19] proposed a paraphrase identification approach and plagiarism detection tool based on contexts and word embeddings. Son et al [20] proposed a plagiarism detection approach using feature extraction techniques which is based on multi-layer long-short term memory (LSTM) networks.…”
Section: Introductionmentioning
confidence: 99%