Fourth International Conference on Information Technology (ITNG'07) 2007
DOI: 10.1109/itng.2007.17
|View full text |Cite
|
Sign up to set email alerts
|

A Probabilistic Approach to Source Code Authorship Identification

Abstract: There exists a need for tools to help identify the authorship of source code. This includes situations in which the

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
35
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 34 publications
(35 citation statements)
references
References 8 publications
0
35
0
Order By: Relevance
“…Compared to natural languages, authorship attribution for source code is a much younger field. Early works include Spafford and Weeber [23], Pellen [19], Kothari et al [16], and Frantzeskou et al [10]. Recently, Caliskan-Islam et al [5] improved the state-of-the-art by using a combination of syntactic, lexical, and layout features and achieved higher accuracy over a much larger group of programmers than previous works.…”
Section: Related Workmentioning
confidence: 99%
“…Compared to natural languages, authorship attribution for source code is a much younger field. Early works include Spafford and Weeber [23], Pellen [19], Kothari et al [16], and Frantzeskou et al [10]. Recently, Caliskan-Islam et al [5] improved the state-of-the-art by using a combination of syntactic, lexical, and layout features and achieved higher accuracy over a much larger group of programmers than previous works.…”
Section: Related Workmentioning
confidence: 99%
“…In [15], the feature extraction technique extracted 170 style-based features including, but not limited to, the number of blank lines, the average sentence length, and the total number of function words. When dealing with source code there are less stylometric features and more lexical features to focus on [21]. In this case an N-Gram based feature extraction technique may be more suitable since this approach is able to capture a trace of style, lexical information, punctuation and capitalization [19].…”
Section: B Author Identificationmentioning
confidence: 99%
“…This problem is made more difficult by the fact that a different set of metrics may have better performance when considering different groups of authors and therefore needs to be recalculated for each sample set. Kothari et al [13] used entropy filtering to select optimal metrics. Lange et al [14] used a genetic algorithm to achieve similar results.…”
Section: Related Workmentioning
confidence: 99%
“…Previous work by Kothari et al [13] examined source code as a text document and identified certain software developer peculiarities that persisted across different projects. Those styles were used to determine the authorship of disputed source code.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation