Scalable test data generation from multidimensional models

Torlak, Emina

doi:10.1145/2393596.2393637

Cited by 16 publications

(7 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another data generation approach from outside the UML community is TestBlox [79]. TestBlox aims to produce large, valid, and statistically representative data sets from multidimensional models to test the performance of big-data computing platforms.…”

Section: Related Workmentioning

confidence: 99%

Practical Constraint Solving for Generating System Test Data

Soltana

Sabetzadeh

Briand

2020

ACM Trans. Softw. Eng. Methodol.

View full text Add to dashboard Cite

The ability to generate test data is often a necessary prerequisite for automated software testing. For the generated data to be fit for its intended purpose, the data usually has to satisfy various logical constraints. When testing is performed at a system level, these constraints tend to be complex and are typically captured in expressive formalisms based on first-order logic. Motivated by improving the feasibility and scalability of data generation for system testing, we present a novel approach, whereby we employ a combination of metaheuristic search and Satisfiability Modulo Theories (SMT) for constraint solving. Our approach delegates constraint solving tasks to metaheuristic search and SMT in such a way as to take advantage of the complementary strengths of the two techniques. We ground our work on test data models specified in UML, with OCL used as the constraint language. We present tool support and an evaluation of our approach over three industrial case studies. The results indicate that, for complex system test data generation problems, our approach presents substantial benefits over the state of the art in terms of applicability and scalability.

show abstract

Section: Related Workmentioning

confidence: 99%

Practical Constraint Solving for Generating System Test Data

Soltana

Sabetzadeh

Briand

2020

ACM Trans. Softw. Eng. Methodol.

View full text Add to dashboard Cite

show abstract

“…We note that many approaches generate large data sets to evaluate the performance of big data computing platforms [1,3,4,13,23,26,27]. These approaches are fundamentally different from test data generation approaches whose goal is to find faults that may exist in big data programs.…”

Section: Related Workmentioning

confidence: 99%

Applying combinatorial test data generation to big data applications

Lei

Khan

et al. 2016

Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

View full text Add to dashboard Cite

Big data applications (e.g., Extract, Transform, and Load (ETL) applications) are designed to handle great volumes of data. However, processing such great volumes of data is time-consuming. There is a need to construct small yet effective test data sets during agile development of big data applications. In this paper, we apply a combinatorial test data generation approach to two real-world ETL applications at Medidata. In our approach, we first create Input Domain Models (IDMs) automatically by analyzing the original data source and incorporating constraints manually derived from requirements. Next, the IDMs are used to create test data sets that achieve t-way coverage, which has shown to be very effective in detecting software faults. The generated test data sets also satisfy all the constraints identified in the first step. To avoid creating IDMs from scratch when there is a change to the original data source or constraints, our approach extends the original IDMs with additional information. The new IDMs, which we refer to as Adaptive IDMs (AIDMs), are updated by comparing the changes against the additional information, and are then used to generate new test data sets. We implement our approach in a tool, called comBinatorial bIg daTa Test dAta Generator (BIT-TAG). Our experience shows that combinatorial testing can be effectively applied to big data applications. In particular, the test data sets created using our approach for the two ETL applications are only a small fraction of the original data source, but we were able to detect all the faults found with the original data source. CCS Concepts •Software and its engineering → Software testing and debugging;

show abstract

“…The reason is that the subtle correlations between attributes are often not captured. Another line of work [2,3,4,15,9,18] addresses this problem by considering a richer set of constraints, e.g., generating a database given a workload of queries such that each intermediate result has a certain size. They constraints are typically specified in a declarative language and the use of constraint solvers is very common in these works.…”

Section: Related Workmentioning

confidence: 99%

“…Recent work [2,9,18] has proposed generating workload-aware datasets with the help of constraint solvers. However, these do not scale well to the amounts of data typically present in a customer dataset.…”

Section: Introductionmentioning

confidence: 99%

Reversing statistics for scalable test databases generation

Shen

Antova

2013

Proceedings of the Sixth International Workshop on Testing Database Systems

View full text Add to dashboard Cite

Testing the performance of database systems is commonly accomplished using synthetic data and workload generators such as TPC-H and TPC-DS. Customer data and workloads are hard to obtain due to their sensitive nature and prohibitively large sizes. As a result, oftentimes the data management systems are not properly tested before releasing, and performance-related bugs are commonly discovered after deployment, when the cost of fixing is very high. In this paper we propose RSGen, an approach to generating datasets out of customer metadata information, including schema, integrity constraints and statistics. RSGen enables generation of data that closely matches the customer environment, and is fast, scalable and extensible.

show abstract

Scalable test data generation from multidimensional models

Cited by 16 publications

References 37 publications

Practical Constraint Solving for Generating System Test Data

Practical Constraint Solving for Generating System Test Data

Applying combinatorial test data generation to big data applications

Reversing statistics for scalable test databases generation

Contact Info

Product

Resources

About