Mark Harman scite author profile

We introduce Sapienz, an approach to Android testing that uses multi-objective search-based testing to automatically explore and optimise test sequences, minimising length, while simultaneously maximising coverage and fault revelation. Sapienz combines random fuzzing, systematic and search-based exploration, exploiting seeding and multi-level instrumentation. Sapienz significantly outperforms (with large effect size) both the state-of-the-art technique Dynodroid and the widely-used tool, Android Monkey, in 7/10 experiments for coverage, 7/10 for fault detection and 10/10 for fault-revealing sequence length. When applied to the top 1,000 Google Play apps, Sapienz found 558 unique, previously unknown crashes. So far we have managed to make contact with the developers of 27 crashing apps. Of these, 14 have confirmed that the crashes are caused by real faults. Of those 14, six already have developer-confirmed fixes. CCS Concepts •Software and its engineering → Software testing and debugging; Search-based software engineering;

show abstract

Mutation Testing Advances: An Analysis and Survey

Papadakis

et al. 2019

View full text Add to dashboard Cite

Mutation testing realises the idea of using artificial defects to support testing activities. Mutation is typically used as a way to evaluate the adequacy of test suites, to guide the generation of test cases and to support experimentation.Mutation has reached a maturity phase and gradually gains popularity both in academia and in industry. This chapter presents a survey of recent advances, over the past decade, related to the fundamental problems of mutation testing and sets out the challenges and open problems for the future development of the method. It also collects advices on best practices related to the use of mutation in empirical studies of software testing. Thus, giving the reader a 'mini-handbook'-style roadmap for the application of mutation testing as experimental methodology.

show abstract

Machine Learning Testing: Survey, Landscapes and Horizons

Zhang

Harman

Ма

et al. 2022

IIEEE Trans. Software Eng.

431

263

View full text Add to dashboard Cite

This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 128 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing. Index Terms-machine learning, software testing, deep neural network, ! • Jie M. Zhang and Mark Harman are with CREST, University College London, United Kingdom. Mark Harman is also with Facebook London.

show abstract

A survey of the use of crowdsourcing in software engineering

Mao

Capra

Harman

et al. 2017

Journal of Systems and Software

294

229

View full text Add to dashboard Cite

Crowdsourcing can be used to support software engineering activities and research into these activities. In this paper we provide a comprehensive survey of the use of crowdsourcing to support software engineering activities (Crowdsourced Software Engineering), seeking to cover all literature on this topic. We describe the software engineering domains, tasks and applications for crowdsourcing and the platforms and stakeholders involved in realising Crowdsourced Software Engineering solutions. We also expose trends, issues and opportunities for Crowdsourced Software Engineering.Please cite as: Ke Mao, Licia Capra, Mark Harman and Yue Jia. A Survey of the Use of Crowdsourcing in SoftwareEngineering.

show abstract

A Survey of App Store Analysis for Software Engineering

Martin

Sarro

Jia

et al. 2017

IIEEE Trans. Software Eng.

314

231

View full text Add to dashboard Cite

App Store Analysis studies information about applications obtained from app stores. App stores provide a wealth of information derived from users that would not exist had the applications been distributed via previous software deployment methods. App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these forms of software repositories. Findings from App Store Analysis have a direct and actionable impact on the software teams that develop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security and testing. This survey describes and compares the areas of research that have been explored thus far, drawing out common aspects, trends and directions future research should take to address open problems and challenges.

show abstract

Multi-objective software effort estimation

2016

View full text Add to dashboard Cite

We introduce a bi-objective effort estimation algorithm that combines Confidence Interval Analysis and assessment of Mean Absolute Error. We evaluate our proposed algorithm on three different alternative formulations, baseline comparators and current state-of-the-art effort estimators applied to five real-world datasets from the PROMISE repository, involving 724 different software projects in total. The results reveal that our algorithm outperforms the baseline, state-of-the-art and all three alternative formulations, statistically significantly (p < 0.001) and with large effect size (Â12 ≥ 0.9) over all five datasets. We also provide evidence that our algorithm creates a new state-of-the-art, which lies within currently claimed industrial human-expert-based thresholds, thereby demonstrating that our findings have actionable conclusions for practicing software engineers.

show abstract

Comparing white-box and black-box test prioritization

et al. 2016

View full text Add to dashboard Cite

Although white-box regression test prioritization has been well-studied, the more recently introduced black-box prioritization approaches have neither been compared against each other nor against more well-established white-box techniques. We present a comprehensive experimental comparison of several test prioritization techniques, including wellestablished white-box strategies and more recently introduced black-box approaches. We found that Combinatorial Interaction Testing and diversity-based techniques (Input Model Diversity and Input Test Set Diameter) perform best among the black-box approaches. Perhaps surprisingly, we found little difference between black-box and white-box performance (at most 4% fault detection rate difference). We also found the overlap between black-and white-box faults to be high: the first 10% of the prioritized test suites already agree on at least 60% of the faults found. These are positive findings for practicing regression testers who may not have source code available, thereby making white-box techniques inapplicable. We also found evidence that both black-box and white-box prioritization remain robust over multiple system releases. CCS Concepts •Software and its engineering → Software testing and debugging;

show abstract

Threats to the validity of mutation-based test assessment

et al. 2016

View full text Add to dashboard Cite

Much research on software testing and test techniques relies on experimental studies based on mutation testing. In this paper we reveal that such studies are vulnerable to a potential threat to validity, leading to possible Type I errors; incorrectly rejecting the Null Hypothesis. Our findings indicate that Type I errors occur, for arbitrary experiments that fail to take countermeasures, approximately 62% of the time. Clearly, a Type I error would potentially compromise any scientific conclusion. We show that the problem derives from such studies' combined use of both subsuming and subsumed mutants. We collected articles published in the last two years at three leading software engineering conferences. Of those that use mutation-based test assessment, we found that 68% are vulnerable to this threat to validity.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mark Harman

Sapienz: multi-objective automated testing for Android applications

Mutation Testing Advances: An Analysis and Survey

Machine Learning Testing: Survey, Landscapes and Horizons

A survey of the use of crowdsourcing in software engineering

A Survey of App Store Analysis for Software Engineering

Multi-objective software effort estimation

Comparing white-box and black-box test prioritization

Threats to the validity of mutation-based test assessment

Contact Info

Product

Resources

About