2020
DOI: 10.1007/s42979-020-00281-1
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning

Abstract: Source code classification (SCC) is a task to assign codes into different categories according to a criterion such as according to their functionalities, programming languages or vulnerabilities. Many source code archives are organized according to the programming languages, and thereby, the desired code fragments can be easily accessed by searching within the archive. However, manually organizing source code archives by field experts is labor intensive and impractical because of the fastgrowing available sour… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
2

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(13 citation statements)
references
References 24 publications
0
11
2
Order By: Relevance
“…With respect to starting from scratch, our approach offers the benefits of cheaper (re)training, reducing maintenance costs. Considering the very marginal reduction in precision in comparison to previous work (≤ 1.5% with respect to Kiyak et al (2020)), which is probably in large part imputable to the much higher language diversity in our experiments, the pros/cons balance seems to tilt towards pretrained CNNs and transfer learning. In this respect it seems worth to explore side-tuning (Zhang et al, 2020), a recent technique for transfer learning which consists in adapting a pre-trained network by training a lightweight ''side'' network that is then fused with the (unchanged) pre-trained network via summation.…”
Section: Discussioncontrasting
confidence: 70%
See 4 more Smart Citations
“…With respect to starting from scratch, our approach offers the benefits of cheaper (re)training, reducing maintenance costs. Considering the very marginal reduction in precision in comparison to previous work (≤ 1.5% with respect to Kiyak et al (2020)), which is probably in large part imputable to the much higher language diversity in our experiments, the pros/cons balance seems to tilt towards pretrained CNNs and transfer learning. In this respect it seems worth to explore side-tuning (Zhang et al, 2020), a recent technique for transfer learning which consists in adapting a pre-trained network by training a lightweight ''side'' network that is then fused with the (unchanged) pre-trained network via summation.…”
Section: Discussioncontrasting
confidence: 70%
“…Image-based PLI Kiyak et al (2020) compared several image-and text-based approaches to Programming Language Identification (PLI). At a glance, Table 1 in their work reports that the maximum diversity supported by image-based PLI among surveyed works was 8 languages, reached by the same authors in Kiyak et al (2020) with an accuracy of 93.5% on a dataset of 40 K files. We achieve comparable performances (92% precision and recall) with much higher language diversity (149 languages) and on a larger dataset (300 K snippets).…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations