Context: Machine learning (ML) may enable effective automated test generation.Objectives: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges.Methods: We perform a systematic literature review on a sample of 97 publications.Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning-often based on neural networks-and reinforcement learning-often based on Q-learning-are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function.
Conclusion:Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed-and how they are applied-benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.