A search engine for finding highly relevant applications

Grechanik, Mark; Fu, Chen; Xie, Qing; McMillan, Collin; Poshyvanyk, Denys; Cumby, Chad

doi:10.1145/1806799.1806868

Cited by 117 publications

(68 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We assess the efficiency of the engines through the Mean Reciprocal Rank (MRR), a statistical metric used to evaluate a process that produces a list of possible responses to a query [18]. The reciprocal rank of a query is the multiplicative inverse of the rank of the first relevant answer.…”

Section: Rq3: Comparison Against General Search Enginesmentioning

confidence: 99%

Augmenting and structuring user queries to support efficient free-form code search

et al. 2018

View full text Add to dashboard Cite

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and Stack Overflow Q&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users complete solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for Stack Overflow questions.

show abstract

Section: Rq3: Comparison Against General Search Enginesmentioning

confidence: 99%

Augmenting and structuring user queries to support efficient free-form code search

et al. 2018

View full text Add to dashboard Cite

show abstract

“…The terms added can come from a variety of thesauruses [66], rule systems mapping keywords to related terms [28], related Java documentation [41], or from the code the developer is currently writing [12]. For example, Lemos et al [66] found that, when queries were automatically expanded with synonyms from the WordNet [135] thesaurus, it increased recall of CodeGenie [65] by 30% (i.e., query expansion allowed CodeGenie to return more on topic results that otherwise would not have been returned).…”

Section: Improving Ranking Algorithms With Automatic Query Modificationmentioning

confidence: 99%

Understanding the impact of support for iteration on code search

Martie

Hoek

Kwak

2017

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

View full text Add to dashboard Cite

“…Since programs contain API calls with precisely defined semantics, these API calls can serve as semantic anchors to compute the degree of similarity between requirements and artifacts by matching the semantics of these applications that are expressed with these API calls. Programmers routinely use third-party API calls (e.g., the Java Development Kit (JDK)) to implement various requirements [10,21,30,31,47]. API calls from well-known and widely used libraries have precisely defined semantics-unlike names of program variables, types, and words that programmers use in comments.…”

Section: B Our Hypothesesmentioning

confidence: 99%

Enhancing Software Traceability by Automatically Expanding Corpora with Relevant Documentation

Dasgupta

Grechanik

Moritz

et al. 2013

2013 IEEE International Conference on Software Maintenance

Self Cite

View full text Add to dashboard Cite

Abstract-Software traceability is the ability to describe and follow the life of a requirement in both a forward and backward direction by defining relationships to related development artifacts. A plethora of different traceability recovery approaches use information retrieval techniques, which depend on the quality of the textual information in requirements and software artifacts. Not only is it important that stakeholders use meaningful names in these artifacts, but also it is crucial that the same names are used to specify the same concepts in different artifacts. Unfortunately, the latter is difficult to enforce and as a result, software traceability approaches are not as efficient and effective as they could be -to the point where it is questionable whether the anticipated economic and quality benefits were indeed achieved.We propose a novel and automatic approach for expanding corpora with relevant documentation that is obtained using external function call documentation and sets of relevant words, which we implemented in TraceLab. We experimented with three Java applications and we show that using our approach the precision of recovering traceability links was increased by up to 31% in the best case and by approximately 9% on average.

show abstract

A search engine for finding highly relevant applications

Cited by 117 publications

References 33 publications

Augmenting and structuring user queries to support efficient free-form code search

Augmenting and structuring user queries to support efficient free-form code search

Understanding the impact of support for iteration on code search

Enhancing Software Traceability by Automatically Expanding Corpora with Relevant Documentation

Contact Info

Product

Resources

About