Abstract. The manipulation of large-scale document data sets often involves the processing of a wealth of features that correspond with the available terms in the document space. The employment of all these features in the learning machine of interest is time consuming and at times reduces the performance of the learning machine. The feature space may consist of many redundant or non-discriminant features; therefore, feature selection techniques have been widely used. In this paper, we introduce a hybrid feature selection algorithm that selects features by applying both filter and wrapper methods in a hybrid manner, and iteratively selects the most competent set of features with an expectation maximization based algorithm. The proposed method employs a greedy algorithm for feature selection in each step. The method has been tested on various data sets whose results have been reported in this paper. The performance of the method both in terms of accuracy and Normalized Mutual Information is promising.
Regression testing is a form of software quality assurance (QA) that involves comparing the behavior of a newer version of a software artifact to its earlier correct behavior, and signaling the QA engineer when deviations are detected. Given the large potential in automated generation and execution of regression test cases for business process models in the context of running systems, powerful tools are required to make this practically feasible, more specifically to limit the potential impact on production systems, and to reduce the manual effort required from QA engineers.
In this paper, we present a regression testing automation framework that implements the capture & replay paradigm in the context of BPMN 2.0, a domain-specific language for modeling and executing business processes. The framework employs parallelization techniques and efficient communication patterns to reduce the performance overhead of capturing. Based on inputs from the QA engineer, it manipulates the BPMN2 model before executing tests for isolating the latter from external dependencies (e.g. human actors or expensive web services) and for avoiding undesired side-effects. Finally, it performs a regression detection algorithm and reports the results to the QA engineer.
We have implemented our framework on top of a BPMN2-compliant execution engine, namely jBPM, and performed functional validations and evaluations of its performance and fault-tolerance. The results, indicating 3.9% average capturing performance overhead, demonstrate that the implemented framework can be the foundation of a practical regression testing tool for BPMN 2.0, and a key enabler for continuous delivery of business process-driven applications and services.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.