Michal Duracik scite author profile

Summary Plagiarism is becoming an increasingly serious problem in academic environment. In this paper, we deal with a specific kind of plagiarism: source code plagiarism. In this case, there is no software available for detecting plagiarism on a larger scale (hundreds of student submissions every year). We propose algorithms for source code parsing and processing as a part of a complex system for plagiarism detection. A source code vectorization using characteristic vectors is a vital piece of the whole process, and k‐means algorithm helps with the classification and clustering of vectors. Student assignments are submitted regularly, and any plagiarism detection system needs to handle them as they come. For this reason, we propose a modified incremental k‐means algorithm and a method for determining the number of clusters. We also consider methods for vector search among clusters and suggest the use of conditional entropy to select the important vector elements used in the search algorithm. Our results show how the proposed algorithms and methods work on real student submissions.

show abstract

Scalable Source Code Plagiarism Detection Using Source Code Vectors Clustering

Duracik

Kršák

Hrkut

2018

View full text Add to dashboard Cite

Source Code Representations for Plagiarism Detection

Duracik

Kršák

Hrkut

2018

View full text Add to dashboard Cite

Issues with the Detection of Plagiarism in Programming Courses on a Larger Scale

Duracik

Kršák

Hrkut

2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michal Duracik

Current Trends in Source Code Analysis, Plagiarism Detection and Issues of Analysis Big Datasets

Searching source code fragments using incremental clustering

Scalable Source Code Plagiarism Detection Using Source Code Vectors Clustering

Source Code Representations for Plagiarism Detection

Issues with the Detection of Plagiarism in Programming Courses on a Larger Scale

Contact Info

Product

Resources

About