Lisan Sulistiani scite author profile

Comp Applic In Engineering

2018

Source code plagiarism detection using Running‐Karp‐Rabin Greedy‐String‐Tiling (RKRGST) is a common practice in academic environment. However, such approach is time‐inefficient (due to RKRGST's cubic time complexity) and insensitive (toward token subsequence rearrangement). This paper proposes ES‐Plag, a plagiarism detection tool featured with cosine‐based filtering and penalty mechanism to handle aforementioned issues. Cosine‐based filtering mitigates time‐inefficiency by excluding non‐potential pairs from RKRGST comparison; while penalty mechanism mitigates insensitivity by reducing the number of matched tokens with the number of matched subsequences prior similarity normalization. In addition to issue‐solving features, ES‐Plag is also featured with project‐based input, colorized adjacency similarity matrix, matched token highlighting, and various similarity algorithms (e.g., Cosine Similarity and Local Alignment). Three findings can be deducted from our evaluation. First, cosine‐based filtering boosts up time efficiency with a trade‐off in effectiveness. Second, penalty mechanism enhances sensitivity even though its improvement in terms of effectiveness is quite limited. Third, ES‐Plag's features are beneficial for examiners.

Which Source Code Plagiarism Detection Approach is More Humane?

2018

This paper contributes in developing source code plagiarism detection that is more aligned with human perspective. Three evaluation mechanisms that directly relate human perspective with evaluated approaches are proposed: think-aloud, aspectoriented, and empirical mechanism. Using those mechanisms, a comparative study toward attribute-and structure-based plagiarism detection approach (i.e., two popular approach categories in source code plagiarism detection) is conducted. According to that study, structure-based approach is more effective than the attribute-based one; its signature aspect and resulted similarity degrees are more related to human preferences. In addition, such approach is related to most human-oriented aspects for suspecting source code plagiarism.

Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

2018

To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yetorder-insensitive similarity measurement (adapted from Information Retrieval) and only pairs which similarity degrees are higher or equal to a particular threshold is selected. Defining such threshold is not a trivial task considering the threshold should lead to high efficiency improvement and low effectiveness reduction (if it is unavoidable). This paper proposes two thresholding mechanisms-namely range-based and pair-countbased mechanism-that dynamically tune the threshold based on the distribution of resulted similarity degrees. According to our evaluation, both mechanisms are more practical to be used than manual threshold assignment since they are more proportional to efficiency improvement and effectiveness reduction.

An Embedding Technique for Language-Independent Lecturer-Oriented Program Visualization

2018

emitter

Program Visualization (PV) tool aims to help novice programmers to learn how a particular program works through interactive and descriptive visualization. However, most of the tools are languagedependent: they use either a language-dependent debugger or a language-dependent code to generate visualization. Such dependency may become a problem when a program written in new programming language is incorporated. Therefore, this paper proposes an embedding technique to handle given issue. To incorporate new programming language, it only needs five languagedependent features to be set: compile command, run command, library-import instruction set, file-writer function-declaration instructions, and file-writer function-invocation instruction. In general, our proposed technique works in threefold: embedding some statements to target program, generating visualization states by running the program with console commands, and visualizing the given program based on generated visualization states. According to our evaluation, proposed technique is able to incorporate program written in any programming languages as long as those languages provide required language-dependent features. Further, it is practical to be used since it still has the benefits of conventional PV despite its language-independent behavior.

Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

Karnalim¹,

Sulistiani²

2018

Preprint

Which Source Code Plagiarism Detection Approach is More Humane?

Karnalim¹,

Sulistiani²

2018

Preprint

Automatic Topic Clustering Using Latent Dirichlet Allocation with Skip-Gram Model on Final Project Abstracts

Bunyamin

2017