Explainable source code authorship attribution algorithm

Bogdanova, A. D.; Romanov, Vitaly

doi:10.1088/1742-6596/2134/1/012011

Cited by 3 publications

(5 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The accuracy of this method was 42% on 400 Python authors. Bogdanova and Romanov [114] addressed the inability to add new authors without retraining and the lack of interpretability in source code authorship attribution. They trained a convolutional neural network to generate the vector representation for files.…”

Section: Deep Learning Modelsmentioning

confidence: 99%

“…Bogdanova and Romanov [114] presented the Saliency map algorithm for the Source Sode Authorship Attribution (SCAA), which is interpretable. It assigned an importance value to each input parameter.…”

Section: Explanation Of Attribution Through Featuresmentioning

confidence: 99%

“…There is a large accuracy variation ranging from 42% to 94% in Refs. [111][112][113][114]. Its highest performance slightly trails behind RFC paired with NN-generated code representations.…”

Section: Accuracymentioning

confidence: 99%

See 2 more Smart Citations

Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey

He,

Lashkari,

Vombatkere

et al. 2024

Information

View full text Add to dashboard Cite

Over the past few decades, researchers have put their effort and paid significant attention to the authorship attribution field, as it plays an important role in software forensics analysis, plagiarism detection, security attack detection, and protection of trade secrets, patent claims, copyright infringement, or cases of software theft. It helps new researchers understand the state-of-the-art works on authorship attribution methods, identify and examine the emerging methods for authorship attribution, and discuss their key concepts, associated challenges, and potential future work that could help newcomers in this field. This paper comprehensively surveys authorship attribution methods and their key classifications, used feature types, available datasets, model evaluation criteria and metrics, and challenges and limitations. In addition, we discuss the potential future research directions of the authorship attribution field based on the insights and lessons learned from this survey work.

show abstract

Section: Deep Learning Modelsmentioning

confidence: 99%

Section: Explanation Of Attribution Through Featuresmentioning

confidence: 99%

See 1 more Smart Citation

Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey

He,

Lashkari,

Vombatkere

et al. 2024

Information

View full text Add to dashboard Cite

show abstract

“…In their recent research, He et al [35] provided a comprehensive examination of the methods, models, datasets, feature types, and evaluation metrics employed in author attribution studies conducted for both source code and English text. The survey included two deep learning studies [36,37] focused on source code author attribution. These studies introduce interpretable models and introduce the concept of saliency maps to enhance model interpretability.…”

Section: Deep Learning Architecturesmentioning

confidence: 99%

“…The initial model [36] achieved a 42% accuracy, utilizing an NN for embedding projection, alongside tSNE for visualization and the KNN classification algorithm for predictions. Similarly, the second study [37] utilized a CNN for embedding generation and, like its predecessor, employed a KNN classification method, resulting in a 70% accuracy rate. For authorship identification on text, various approaches including multiheaded RNNs [38] have shown promising results.…”

Section: Deep Learning Architecturesmentioning

confidence: 99%

Authorship Attribution in Less-Resourced Languages: A Hybrid Transformer Approach for Romanian

Nitu,

Dascalu

2024

Applied Sciences

View full text Add to dashboard Cite

Authorship attribution for less-resourced languages like Romanian, characterized by the scarcity of large, annotated datasets and the limited number of available NLP tools, poses unique challenges. This study focuses on a hybrid Transformer combining handcrafted linguistic features, ranging from surface indices like word frequencies to syntax, semantics, and discourse markers, with contextualized embeddings from a Romanian BERT encoder. The methodology involves extracting contextualized representations from a pre-trained Romanian BERT model and concatenating them with linguistic features, selected using the Kruskal–Wallis mean rank, to create a hybrid input vector for a classification layer. We compare this approach with a baseline ensemble of seven machine learning classifiers for authorship attribution employing majority soft voting. We conduct studies on both long texts (full texts) and short texts (paragraphs), with 19 authors and a subset of 10. Our hybrid Transformer outperforms existing methods, achieving an F1 score of 0.87 on the full dataset of the 19-author set (an 11% enhancement) and an F1 score of 0.95 on the 10-author subset (an increase of 10% over previous research studies). We conduct linguistic analysis leveraging textual complexity indices and employ McNemar and Cochran’s Q statistical tests to evaluate the performance evolution across the best three models, while highlighting patterns in misclassifications. Our research contributes to diversifying methodologies for effective authorship attribution in resource-constrained linguistic environments. Furthermore, we publicly release the full dataset and the codebase associated with this study to encourage further exploration and development in this field.

show abstract

Learning Explainable Multi-view Representations for Malware Authorship Attribution

Adam,

Waagen,

Warmsley

et al. 2023

2023 IEEE International Conference on Big Data (BigData)

View full text Add to dashboard Cite

Explainable source code authorship attribution algorithm

Cited by 3 publications

References 25 publications

Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey

Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey

Authorship Attribution in Less-Resourced Languages: A Hybrid Transformer Approach for Romanian

Learning Explainable Multi-view Representations for Malware Authorship Attribution

Contact Info

Product

Resources

About