Proceedings of the 26th Conference on Program Comprehension 2018
DOI: 10.1145/3196321.3196359
|View full text |Cite
|
Sign up to set email alerts
|

Learning lexical features of programming languages from imagery using convolutional neural networks

Abstract: We demonstrate the ability of deep architectures, specifically convolutional neural networks, to learn and differentiate the lexical features of different programming languages presented in coding video tutorials found on the Internet. We analyze over 17,000 video frames containing examples of Java, Python, and other textual and non-textual objects. Our results indicate that not only can computer vision models based on deep architectures be taught to differentiate among programming languages with over 98% accu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
18
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
3

Relationship

3
7

Authors

Journals

citations
Cited by 28 publications
(18 citation statements)
references
References 13 publications
0
18
0
Order By: Relevance
“…We are currently working on curating additional labeled data for a variety of programming languages, including C++, and R and have begun an initial exploration into differentiating between Python and Java code samples embedded in digital images through a model that can differentiate between multiple languages while learning lexical features in the process [28]. Using this data we will train an ensemble of classifiers for identifying these languages in video and images.…”
Section: Discussionmentioning
confidence: 99%
“…We are currently working on curating additional labeled data for a variety of programming languages, including C++, and R and have begun an initial exploration into differentiating between Python and Java code samples embedded in digital images through a model that can differentiate between multiple languages while learning lexical features in the process [28]. Using this data we will train an ensemble of classifiers for identifying these languages in video and images.…”
Section: Discussionmentioning
confidence: 99%
“…Image-based PLI has been attempted by others too. Ott et al have shown how to use CNNs to identify video frames that contain Java code within video programming tutorials ( Ott et al, 2018a ) (versus frames not showing code at all) and to distinguish frames containing Java from frames containing Python ( Ott et al, 2018b ). In the present work we consider a much larger set of languages.…”
Section: Related Workmentioning
confidence: 99%
“…Similar to our work, Ott et al [19] proposed to use a VGG network to identify whether frames in programming tutorial videos contain source code. They also use deep learning techniques to classify images based on programming language [20] and UML diagrams [21]. In our study, we combine deep learning techniques and traditional computer vision techniques to achieve better performance than Ott et al 's approach.…”
Section: Source Code Detection and Extraction In Programming Screencastsmentioning
confidence: 99%