2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) 2019
DOI: 10.1109/msr.2019.00015
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Source Code Models Using Identifier Embeddings

Abstract: The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of software development activities. As an effect, in line with recent advances in machine learning research, software maintenance activities are switching from symbolic formal methods to data-driven methods. In this context, the rich semantics hidden in source code identifiers provide opportunities for building semantic repres… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
12
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 37 publications
0
12
0
Order By: Relevance
“…Similarly, (Pradel and Sen 2018) proposed DeepBugs to identify name-based bug detection using semantic representations of code. Likewise, (Efstathiou and Spinellis 2019) proposed distributed code representations for six different programming languages: Java, Python, PHP, C, C++, and C#. They used fastText for learning semantic representations and studied dissimilarities between code and natural language, proposing various applications and limitations.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Similarly, (Pradel and Sen 2018) proposed DeepBugs to identify name-based bug detection using semantic representations of code. Likewise, (Efstathiou and Spinellis 2019) proposed distributed code representations for six different programming languages: Java, Python, PHP, C, C++, and C#. They used fastText for learning semantic representations and studied dissimilarities between code and natural language, proposing various applications and limitations.…”
Section: Related Workmentioning
confidence: 99%
“…For instance, we started with the metrics suggested by Pimentel and colleagues (2019) and others (Biswas et al, 2019) such as the length of notebook titles, the placement of imports, the presence of dependency requirements files, and the use of relative paths to access the data. Similarly, for the DL-based approach, we adopted a highly successful approach that has been developed recently by the DL community: automatically learn suitable representations, i.e., embeddings (Efstathiou & Spinellis, 2019). Such representations are known to improve the performance of downstream learning tasks or applications such as contextual search and analogical reasoning in the case of natural language semantics.…”
Section: Introductionmentioning
confidence: 99%
“…There are many studies on the representation of source code, including recent studies proposing distributed representations for identifiers [17], APIs [46,47], and software libraries [56]. A comprehensive survey of learning the representation of source code has been done by Allamanis et al [1].…”
Section: Related Workmentioning
confidence: 99%
“…The naturalness hypothesis of software approaches the subject in a similar way and asserts that although programming languages, in theory, are complex, flexible and powerful, the code fragments that real people actually write are mostly simple and rather repetitive, and thus they have usefully predictable statistical properties that can be captured in statistical language models and leveraged for software engineering tasks [26]. For example, following the word embeddings concept in NLP, the authors of [27] generated a set of general-purpose models pre-trained over large amounts of code. Although the authors claimed that their models could be used to assist a number of information retrieval tasks, including identifying semantic errors, they did not provide any experimental results for these tasks.…”
Section: Related Workmentioning
confidence: 99%