Proceedings of the 28th International Conference on Software Engineering 2006
DOI: 10.1145/1134285.1134445
|View full text |Cite
|
Sign up to set email alerts
|

Effective identification of source code authors using byte-level information

Abstract: Source code author identification deals with the task of identifying the most likely author of a computer program, given a set of predefined author candidates. This is usually .based on the analysis of other program samples of undisputed authorship by the same programmer. There are several cases where the application of such a method could be of a major benefit, such as authorship disputes, proof of authorship in court, tracing the source of code left in the system after a cyber attack, etc. We present a new a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
79
0

Year Published

2008
2008
2020
2020

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 80 publications
(79 citation statements)
references
References 8 publications
0
79
0
Order By: Relevance
“…The distance function favors that author because the union of the profile of the unseen text and the profile of that author will result significant less n-grams, so the distance between the unseen text and that author would be estimated as quite low in comparison to the other authors. To overcome that problem, Frantzeskou, Stamatatos, Gritzalis, and Katsikas (2006) proposed a different and simpler distance, called simplified profile intersection (SPI), which simply counts the amount of common n-grams of the two profiles disregarding the rest. The application of this measure to author identification of source code provided better results than the original CNG distance.…”
Section: Cng and Variantsmentioning
confidence: 99%
See 1 more Smart Citation
“…The distance function favors that author because the union of the profile of the unseen text and the profile of that author will result significant less n-grams, so the distance between the unseen text and that author would be estimated as quite low in comparison to the other authors. To overcome that problem, Frantzeskou, Stamatatos, Gritzalis, and Katsikas (2006) proposed a different and simpler distance, called simplified profile intersection (SPI), which simply counts the amount of common n-grams of the two profiles disregarding the rest. The application of this measure to author identification of source code provided better results than the original CNG distance.…”
Section: Cng and Variantsmentioning
confidence: 99%
“…More importantly, the plethora of available electronic texts revealed the potential of authorship analysis in various applications (Madigan, Lewis, Argamon, Fradkin, & Ye, 2005) in diverse areas including intelligence (e.g., attribution of messages or proclamations to known terrorists, linking different messages by authorship) (Abbasi & Chen, 2005), criminal law (e.g., identifying writers of harassing messages, verifying the authenticity of suicide notes) and civil law (e.g., copyright disputes) (Chaski, 2005;Grant, 2007), computer forensics (e.g., identifying the authors of source code of malicious software) (Frantzeskou, Stamatatos, Gritzalis, & Katsikas, 2006), in addition to the traditional application to literary research (e.g., attributing anonymous or disputed literary works to known authors) (Burrows, 2002;Hoover, 2004a). Hence, (roughly) the last decade can be viewed as a new era of authorship analysis technology, this time dominated by efforts to develop practical applications dealing with realworld texts (e.g., e-mails, blogs, online forum messages, source code, etc.)…”
Section: Introductionmentioning
confidence: 99%
“…This is followed by a detailed description of the SCAP approach, including conclusions of the method as described in previous work (Frantzeskou et al, 2005(Frantzeskou et al, , 2006. Also, this subsection includes a discussion about the high level features that might influence authorship identification.…”
Section: Related Work On Code Authorshipmentioning
confidence: 99%
“…The use of Source Code Author Profiles (SCAP) represents a new approach to source code authorship identification and classification that is both highly effective (Frantzeskou et al, 2005(Frantzeskou et al, , 2006 and languageindependent, since it is based on low-level non-metric information. In this method, byte-level n-grams are utilised to establish and assess code against author profiles.…”
Section: Introductionmentioning
confidence: 99%
“…Many previous authorship studies focused on analyzing texts in the literature ( [18], [19], [33], [40] and [50]), program codes ( [27], [37] and [45]) and online messages ( [1], [8], [11], [22] and [25]). However, with the growth of the web application and social networks, studies in the last decade focus on analyzing online messages (e-mails, blogs, forum…) rather than literary texts [56].…”
Section: Introductionmentioning
confidence: 99%