Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These problems make the developers browse dozens of documents in order to synthesize an appropriate solution. To address these two problems, we propose CROKAGE (Crowd Knowledge Answer Generator), a tool that takes the description of a programming task (the query) and provides a comprehensive solution for the task. Our solutions contain not only relevant code examples but also their succinct explanations. Our proposed approach expands the task description with relevant API classes from Stack Overflow Q&A threads and then mitigates the lexical gap problems. Furthermore, we perform natural language processing on the top quality answers and then return such programming solutions containing code examples and code explanations unlike earlier studies. We evaluate our approach using 97 programming queries, of which 50% was used for training and 50% was used for testing, and show that it outperforms six baselines including the state-of-art by a statistically significant margin. Furthermore, our evaluation with 29 developers using 24 tasks (queries) confirms the superiority of CROKAGE over the state-of-art tool in terms of relevance of the suggested code examples, benefit of the code explanations and the overall solution quality (code + explanation).
Background: Software libraries and frameworks play an important role in software system development. The appropriate usage of their functionalities/components through their APIs, however, is a challenge for developers. Usually, API documentation, when it exists, is insufficient to assist them in their programming tasks. There are few API documentation writers for the many potential readers, resulting in the lack of explanations and examples concerning different scenarios and perspectives. The interaction of developers on the Web, on the other hand, generates content concerning APIs from different perspectives, which can be used to document APIs, also known as crowd documentation. Methods: In this paper, we present a study regarding the knowledge generated by the crowd on the Stack Overflow question-and-answer website. Our main goal is to understand how the crowd can contribute for API documentation on two programming tasks: how to implement a scenario using an API (how-to-do-it), and how to fix domain-independent bugs in an existing code where there was a misunderstanding regarding the usage of an API (debug-corrective). We classified questions available on Stack Overflow by the main concerns of askers, and we used those classified as how-to-do-it and debug-corrective to analyze the coverage of API elements on the discussions related to such questions. Our cases included the well-known and popular Swing and Android APIs. Results: Our main findings showed that the crowd provides more content for debug-corrective tasks than for how-to-do-it tasks, regardless of the API. Android API elements are more discussed by the crowd compared to Swing. Moreover, we observed that some API elements are frequently mentioned together in discussions, and that there is a strong association between API coverage on Stack Overflow and its usage in real software systems. Conclusions: Crowd documentation may not be a complete substitute for official documentation because of its partial coverage, especially for how-to-do-it tasks. However, it can still significantly enhance the existent documentation, especially for the most commonly used API elements, providing code samples and explanations on a large variety of usage nuances. Finally, taking advantage of the high coverage for debug-corrective tasks, a new kind of debugging assistant may be conceived.
Abstract-Stack Overflow has become a fundamental element of developer toolset. Such influence increase has been accompanied by an effort from Stack Overflow community to keep the quality of its content. One of the problems which jeopardizes that quality is the continuous growth of duplicated questions. To solve this problem, prior works focused on automatically detecting duplicated questions. Two important solutions are DupPredictor and Dupe. Despite reporting significant results, both works do not provide their implementations publicly available, hindering subsequent works in scientific literature which rely on them. We executed an empirical study as a reproduction of DupPredictor and Dupe. Our results, not robust when attempted with different set of tools and data sets, show that the barriers to reproduce these approaches are high. Furthermore, when applied to more recent data, we observe a performance decay of our both reproductions in terms of recall-rate over time, as the number of questions increases. Our findings suggest that the subsequent works concerning detection of duplicated questions in Question and Answer communities require more investigation to assert their findings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.