2009 1st International Symposium on Search Based Software Engineering 2009
DOI: 10.1109/ssbse.2009.18
|View full text |Cite
|
Sign up to set email alerts
|

On the Use of Discretized Source Code Metrics for Author Identification

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(26 citation statements)
references
References 18 publications
0
22
0
Order By: Relevance
“…As discussed in Section 2.2.2, the seven classifier algorithms represented by the machine classifier approaches are case-based reasoning [15], decision trees [18], discriminant analysis variants [12,14,15], nearest-neighbour search [17,19], neural networks [15], Bayesian networks [16], and voting feature intervals [16]. These approaches were published between 1994 [3] and 2009 [19] using either custom-built programs or off-the-shelf software. Our implementation uses the closest available classifier in the Weka machine learning toolkit [41] for each classifier algorithm identified in the literature, as listed in Table VII. In all cases, we used the default Weka parameters for the chosen classifiers except for the k-nearest-neighbour classifier that defaults to k D 1, where we used k D 20, which represents 33% of the instances for one run on COLL-T and a lower proportion for the other collections.…”
Section: Machine Learning Algorithms In Wekamentioning
confidence: 99%
See 2 more Smart Citations
“…As discussed in Section 2.2.2, the seven classifier algorithms represented by the machine classifier approaches are case-based reasoning [15], decision trees [18], discriminant analysis variants [12,14,15], nearest-neighbour search [17,19], neural networks [15], Bayesian networks [16], and voting feature intervals [16]. These approaches were published between 1994 [3] and 2009 [19] using either custom-built programs or off-the-shelf software. Our implementation uses the closest available classifier in the Weka machine learning toolkit [41] for each classifier algorithm identified in the literature, as listed in Table VII. In all cases, we used the default Weka parameters for the chosen classifiers except for the k-nearest-neighbour classifier that defaults to k D 1, where we used k D 20, which represents 33% of the instances for one run on COLL-T and a lower proportion for the other collections.…”
Section: Machine Learning Algorithms In Wekamentioning
confidence: 99%
“…Then, statistical analysis, machine learning, or similarity measurement methods are used to classify work samples. This paper considers the machine classifier contributions of Krsul , MacDonell , Ding , Kothari , Lange , Elenbogen , and Shevertalov .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In reported results the system has achieved an accuracy of 60% for range based discretization, 70% for frequency based discretization and 65% with no discretization. In [9], the author used a variety of metrics including number of each type of data types, the cyclomatic complexity, quantity and quality of comments, type of variables and layout of code. They are also working on IDENTIFIED toolkit for automatic extraction of these metrics.…”
Section: Related Workmentioning
confidence: 99%
“…In the research process, the codes of each author were disposed by feature extraction and SVM training. Use SVM powerful pattern recognition capabilities [12] detect software homology. It will provide effective help to the malware forensics (Author tracking), and copyright disputes solving [13] …”
Section: Svm(support Vector Machine Theroy)mentioning
confidence: 99%