On question and answer sites, such as Stack Overflow (SO), developers use tags to label the content of a post and to support developers in question searching and browsing. However, these tags mainly refer to technological aspects instead of the purpose of the question. Tagging questions with their purpose can add a new dimension to the identification of discussed topics in posts on SO. In this paper, we aim at automating the classification of SO question posts into seven question categories. As a first step, we harmonized existing taxonomies of question categories and then, we manually classified 1,000 SO questions according to our new taxonomy. Additionally to the question category, we marked the phrases that indicate a question category for each of the posts. We then use this data set to automate the classification of posts using two approaches. For the first approach, we manually analyzed the phrases to find patterns. Based on regular expressions, we implemented a classifier, for each of the categories, that determines whether a post belongs to a category. These regular expressions are derived by analyzing patterns in the phrases. In the second approach, we use the curated data set to train classification models of supervised machine learning algorithms (Random Forest and Support Vector Machines). For the machine learning algorithms, we experimented with 1,312 different configurations regarding the preprocessing of the text and the representation of the input data. Then, we compared the performance of the regex approach with the performance of the best configuration that uses machine learning algorithms on a validation set of 110 posts. The results show that using the regular expression approach, we can classify posts into the correct question category with an average precision and recall of 0.90, and an MCC of 0.68. Additionally, we applied the regex approach on all questions of SO that deal with Android app development and investigated the co-occurrence of question categories in posts. We found that the categories API USAGE, CONCEPTUAL, and DISCREPANCY are the most frequently assigned question categories and that they also occur together frequently. Our approach can be used to support developers in browsing SO discussions or researchers in building recommender systems based on SO.
Changes, directly followed by Dependency Changes, and Build Changes. Changes to version properties and to dependencies can be found frequently among the top 10 most frequent change types. The top 10 change types account for 73% of all of the changes. (RQ2) When are build changes recorded?Build changes are not equally distributed over the projects' timeline. There are particular phases that contain significantly more build changes than others. We observe that especially around releases, the frequency of build changes is high. This work makes the following contributions: (1) an approach and a corresponding prototype implementation to extract fine-grained build changes from MAVEN build files, (2) a dataset containing historical build changes of 30 open source Java projects, (3) an evaluation of the performance of this tool, (4) two empirical studies of the frequency and time of build changes, and (5) a replication package that contains supplementary material. 1 The remainder of the paper is organized as follows: Section II situates the paper with respect to the related work. Section III presents our BUILDDIFF approach. Section IV describes the data that we used for this study and evaluates the performance of BUILDDIFF, and discusses its strengths and weaknesses. Section V presents the first study on the frequency of build changes and Section VI shows the second study on when build changes occur. Section VII discusses the implications of our results and threats to validity. Section VIII concludes the paper.
Build systems are an essential part of modern software projects. As software projects change continuously, it is crucial to understand how the build system changes because neglecting its maintenance can, at best, lead to expensive build breakage, or at worst, introduce user-reported defects due to incorrectly compiled, linked, packaged, or deployed official releases. Recent studies have investigated the (co-)evolution of build configurations and reasons for build breakage; however, the prior analysis focused on a coarse-grained outcome (i.e., either build changing or not). In this paper, we present BuildDiff, an approach to extract detailed build changes from Maven build files and classify them into 143 change types. In a manual evaluation of 400 build-changing commits, we show that BuildDiff can extract and classify build changes with average precision, recall, and f1-scores of 0.97, 0.98, and 0.97, respectively. We then present two studies using the build changes extracted from 144 open source Java projects to study the frequency and time of build changes. The results show that the top-10 most frequent change types account for 51% of the build changes. Among them, changes to version numbers and changes to dependencies of the projects occur most frequently. We also observe frequently co-occurring changes, such as changes to the source code management definitions, and corresponding changes to the dependency management system and the dependency declaration. Furthermore, our results show that build changes frequently occur around release days. In particular, critical changes, such as updates to plugin configuration parts and dependency insertions, are performed before a release day. The contributions of this paper lay in the foundation for future research, such as for analyzing the (co-)evolution of build files with other artifacts, improving effort estimation approaches by incorporating necessary modifications to the build system specification, or automatic repair approaches for configuration code. Furthermore, our detailed change information enables improvements of refactoring approaches for build configurations and improvements of prediction models to identify error-prone build files.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.