Context: Families of experiments (i.e., groups of experiments with the same goal) are on the rise in Software Engineering (SE). Selecting unsuitable aggregation techniques to analyze families may undermine their potential to provide in-depth insights from experiments' results. Objectives: Identifying the techniques used to aggregate experiments' results within families in SE. Raising awareness of the importance of applying suitable aggregation techniques to reach reliable conclusions within families. Method: We conduct a systematic mapping study (SMS) to identify the aggregation techniques used to analyze families of experiments in SE. We outline the advantages and disadvantages of each aggregation technique according to mature experimental disciplines such as medicine and pharmacology. We provide preliminary recommendations to analyze and report families of experiments in view of families' common limitations with regard to joint data analysis. Results: Several aggregation techniques have been used to analyze SE families of experiments, including Narrative synthesis, Aggregated Data (AD), Individual Participant Data (IPD) mega-trial or stratified, and Aggregation of p-values. The rationale used to select aggregation techniques is rarely discussed within families. Families of experiments are commonly analyzed with unsuitable aggregation techniques according to the literature of mature experimental disciplines. Conclusion: Data analysis' reporting practices should be improved to increase the reliability and transparency of joint results. AD and IPD stratified appear to be suitable to analyze SE families of experiments.
Existing empirical studies on test-driven development (TDD) report different conclusions about its effects on quality and productivity. Very few of those studies are experiments conducted with software professionals in industry. We aim to analyse the effects of TDD on the external quality of the work done and the productivity of developers in an industrial setting. We conducted an experiment with 24 professionals from three different sites of a software organization. We chose a repeated-measures design, and asked subjects to implement TDD and incremental test last development (ITLD) in two simple tasks and a realistic application close to real-life complexity. To analyse our findings, we applied a repeatedmeasures general linear model procedure and a linear mixed effects procedure. We did not observe a statistical difference between the quality of the work done by subjects in both treatments. We observed that the subjects are more productive when they implement TDD on a simple task compared to ITLD, but the productivity drops significantly when applying TDD to a complex brownfield task. So, the task complexity significantly obscured the effect of TDD. Further evidence is necessary to conclude whether TDD is better or worse than ITLD in terms of external quality and productivity in an industrial setting. We found that experimental factors such as selection of tasks could dominate the findings in TDD studies. E3 Ayse Tosun
Researchers from different groups and institutions are collaborating on building groups of experiments by means of replication (i.e., conducting groups of replications). Disparate aggregation techniques are being applied to analyze groups of replications. The application of unsuitable techniques to aggregate replication results may undermine the potential of groups of replications to provide in-depth insights from experiment results. Objectives: Provide an analysis procedure with a set of embedded guidelines to aggregate software engineering (SE) replication results. Method: We compare the characteristics of groups of replications for SE and other mature experimental disciplines such as medicine and pharmacology. In view of their differences, the limitations with regard to the joint data analysis of groups of SE replications and the guidelines provided in mature experimental disciplines to analyze groups of replications, we build an analysis procedure with a set of embedded guidelines specifically tailored to the analysis of groups of SE replications. We apply the proposed analysis procedure to a representative group of SE replications to illustrate its use. Results: All the information contained within the raw data should be leveraged during the aggregation of replication results. The analysis procedure that we propose encourages the use of stratified individual participant data and aggregated data in tandem to analyze groups of SE replications. Conclusion: The aggregation techniques used to analyze groups of replications should be justified in research articles. This will increase the reliability and transparency of joint results. The proposed guidelines should ease this endeavor.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Context: Recent developments in natural language processing have facilitated the adoption of chatbots in typically collaborative software engineering tasks (such as diagram modelling). Families of experiments can assess the performance of tools and processes and, at the same time, alleviate some of the typical shortcomings of individual experiments (e.g., inaccurate and potentially biased results due to a small number of participants). Objective: Compare the usability of a chatbot for collaborative modelling (i.e., SOCIO) and an online web tool (i.e., Creately). Method: We conduct a family of three experiments to evaluate the usability of SOCIO against the Creately online collaborative tool in academic settings. Results: The student participants were faster at building class diagrams using the chatbot than with the online collaborative tool and more satisfied with SOCIO. Besides, the class diagrams built using the chatbot tended to be more concise -albeit slightly less complete. Conclusion: Chatbots appear to be helpful for building class diagrams. In fact, our study has helped us to shed light on the future direction for experimentation in this field and lays the groundwork for researching the applicability of chatbots in diagramming.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.