Jie M. Zhang scite author profile

Mutation testing realises the idea of using artificial defects to support testing activities. Mutation is typically used as a way to evaluate the adequacy of test suites, to guide the generation of test cases and to support experimentation.Mutation has reached a maturity phase and gradually gains popularity both in academia and in industry. This chapter presents a survey of recent advances, over the past decade, related to the fundamental problems of mutation testing and sets out the challenges and open problems for the future development of the method. It also collects advices on best practices related to the use of mutation in empirical studies of software testing. Thus, giving the reader a 'mini-handbook'-style roadmap for the application of mutation testing as experimental methodology.

show abstract

Machine Learning Testing: Survey, Landscapes and Horizons

Zhang

Harman

Ма

et al. 2022

IIEEE Trans. Software Eng.

432

263

View full text Add to dashboard Cite

This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 128 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing. Index Terms-machine learning, software testing, deep neural network, ! • Jie M. Zhang and Mark Harman are with CREST, University College London, United Kingdom. Mark Harman is also with Facebook London.

show abstract

Automatic testing and improvement of machine translation

Sun

Zhang

Harman

et al. 2020

View full text Add to dashboard Cite

This paper presents TransRepair, a fully automatic approach for testing and repairing the consistency of machine translation systems. TransRepair combines mutation with metamorphic testing to detect inconsistency bugs (without access to human oracles). It then adopts probability-reference or cross-reference to post-process the translations, in a grey-box or black-box manner, to repair the inconsistencies. Our evaluation on two state-of-the-art translators, Google Translate and Transformer, indicates that TransRepair has a high precision (99%) on generating input pairs with consistent translations. With these tests, using automatic consistency metrics and manual assessment, we find that Google Translate and Transformer have approximately 36% and 40% inconsistency bugs. Black-box repair fixes 28% and 19% bugs on average for Google Translate and Transformer. Grey-box repair fixes 30% bugs on average for Transformer. Manual inspection indicates that the translations repaired by our approach improve consistency in 87% of cases (degrading it in 2%), and that our repairs have better translation acceptability in 27% of the cases (worse in 8%).

show abstract

Fairea: a model behaviour mutation approach to benchmarking bias mitigation methods

Hort

Zhang

Sarro

et al. 2021

View full text Add to dashboard Cite

The increasingly wide uptake of Machine Learning (ML) has raised the significance of the problem of tackling bias (i.e., unfairness), making it a primary software engineering concern. In this paper, we introduce Fairea, a model behaviour mutation approach to benchmarking ML bias mitigation methods. We also report on a largescale empirical study to test the effectiveness of 12 widely-studied bias mitigation methods. Our results reveal that, surprisingly, bias mitigation methods have a poor effectiveness in 49% of the cases. In particular, 15% of the mitigation cases have worse fairness-accuracy trade-offs than the baseline established by Fairea; 34% of the cases have a decrease in accuracy and an increase in bias.Fairea has been made publicly available for software engineers and researchers to evaluate their bias mitigation methods. CCS CONCEPTS• Software and its engineering → Software creation and management; Extra-functional properties.

show abstract

"Ignorance and Prejudice" in Software Fairness

Zhang

Harman

2021

View full text Add to dashboard Cite

Machine learning software can be unfair when making human-related decisions, having prejudices over certain groups of people. Existing work primarily focuses on proposing fairness metrics and presenting fairness improvement approaches. It remains unclear how key aspect of any machine learning system, such as feature set and training data, affect fairness. This paper presents results from a comprehensive study that addresses this problem. We find that enlarging the feature set plays a significant role in fairness (with an average effect rate of 38%). Importantly, and contrary to widely-held beliefs that greater fairness often corresponds to lower accuracy, our findings reveal that an enlarged feature set has both higher accuracy and fairness. Perhaps also surprisingly, we find that a larger training data does not help to improve fairness. Our results suggest a larger training data set has more unfairness than a smaller one when feature sets are insufficient; an important cautionary finding for practising software engineers.

show abstract

Machine Learning Testing: Survey, Landscapes and Horizons

Zhang

Harman

Ма

et al. 2019

Preprint

View full text Add to dashboard Cite

A Study of Bug Resolution Characteristics in Popular Programming Languages

Zhang

Hao

et al. 2021

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Improving machine translation systems via isotopic replacement

Sun

Zhang

Xiong

et al. 2022

View full text Add to dashboard Cite

Machine translation plays an essential role in people's daily international communication. However, machine translation systems are far from perfect. To tackle this problem, researchers have proposed several approaches to testing machine translation. A promising trend among these approaches is to use word replacement, where only one word in the original sentence is replaced with another word to form a sentence pair. However, precise control of the impact of word replacement remains an outstanding issue in these approaches.To address this issue, we propose CAT, a novel word-replacementbased approach, whose basic idea is to identify word replacement with controlled impact (referred to as isotopic replacement). To achieve this purpose, we use a neural-based language model to encode the sentence context, and design a neural-network-based algorithm to evaluate context-aware semantic similarity between two words. Furthermore, similar to TransRepair, a state-of-the-art word-replacement-based approach, CAT also provides automatic fixing of revealed bugs without model retraining.Our evaluation on Google Translate and Transformer indicates that CAT achieves significant improvements over TransRepair. In particular, 1) CAT detects seven more types of bugs than TransRepair; 2) CAT detects 129% more translation bugs than TransRepair; 3) CAT repairs twice more bugs than TransRepair, many of which may bring serious consequences if left unfixed; and 4) CAT has better efficiency than TransRepair in input generation (0.01s v.s. 0.41s) and comparable efficiency with TransRepair in bug repair (1.92s v.s. 1.34s).

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jie M. Zhang

Mutation Testing Advances: An Analysis and Survey

Machine Learning Testing: Survey, Landscapes and Horizons

Automatic testing and improvement of machine translation

Fairea: a model behaviour mutation approach to benchmarking bias mitigation methods

"Ignorance and Prejudice" in Software Fairness

Machine Learning Testing: Survey, Landscapes and Horizons

A Study of Bug Resolution Characteristics in Popular Programming Languages

Improving machine translation systems via isotopic replacement

Contact Info

Product

Resources

About