2014
DOI: 10.1002/cae.21608
|View full text |Cite
|
Sign up to set email alerts
|

Uncovering source code reuse in large‐scale academic environments

Abstract: The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection.Our app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0
3

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 25 publications
(18 citation statements)
references
References 25 publications
0
14
0
3
Order By: Relevance
“…With respect to plagiarism detection, recently it has been approached also in source code [12,13], and a PAN shared task on the detection of SOurce COde (SOCO) re-use has been organised at the Forum for Information Retrieval Evaluation.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…With respect to plagiarism detection, recently it has been approached also in source code [12,13], and a PAN shared task on the detection of SOurce COde (SOCO) re-use has been organised at the Forum for Information Retrieval Evaluation.…”
Section: Discussionmentioning
confidence: 99%
“…Evaluation Corpora In the author profiling task at PAN 2013 [58] participants approached the task of identifying age and gender in a large corpus collected from social media, and age was annotated with three classes: 10s (13)(14)(15)(16)(17), 20s (23)(24)(25)(26)(27), and 30s (33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47). At PAN 2014, we continued to study the gender and age aspects of the author profiling problem, however, four corpora of different genres were considered-social media, blogs, Twitter, and hotel reviews-both in English and Spanish.…”
Section: Author Profiling: How Writing Style Is Sharedmentioning
confidence: 99%
See 1 more Smart Citation
“…Existing literature on each topic is vast, so some authors have already surveyed approaches in automatic plagiarism detection. Here, we merely give reference to the most important studies, including [14,2,17,28]. …”
Section: Automatic Plagiarism Detectionmentioning
confidence: 99%
“…Hence, the task of detecting source code re-use becomes even more difficult, since all the source codes will contain (to some extent) a considerable thematic overlap. On the other hand, detection of source code re-use in programming environments, such as programming contests, has an additional challenge, this is the large number of source codes that must be processed for detecting such practises [10], and as a result, source code reuse detection becomes some how unfeasible. Consequently, most of the research on source code re-use detection has been mostly applied to closed groups [23,20,13].…”
Section: Introductionmentioning
confidence: 99%