Abstract-MapReduce is a paradigm that allows parallel processing of large amounts of data. MapReduce programs combined with their underlying run-time framework have distinctive features that are prone to include unexpected behaviors not present in other types of programs. This paper describes an approach to functional testing of MapReduce programs based on a hierarchical classification of a number of potential faults that may occur in MapReduce programs over Hadoop. This classification, called MRTree, is then used to derive test cases able to detect the faults represented in MRTree and illustrated with some examples.
New processing models are being adopted in Big Data Engineering to overcome the limitations of traditional technology. Among them, MapReduce stands out by allowing for the processing of large volumes of data over a distributed infrastructure that can change during runtime. The developer only designs the functionality of the program and its execution is managed by a distributed system. As a consequence, a program can behave differently at each execution because it is automatically adapted to the resources available at each moment. Therefore, when the program has a design fault, this could be revealed in some executions and masked in others. However, during testing, these faults are usually masked because the test infrastructure is stable, and they are only revealed in production because the environment is more aggressive with infrastructure failures, among other reasons. This paper proposes new testing techniques aimed to detect these design faults by simulating different infrastructure configurations. The testing techniques generate a representative set of infrastructure configurations that as whole are more likely to reveal failures using Random testing, and Partition testing together with Combinatorial testing. The techniques are automated by using a test execution engine called MRTest that is able to detect these faults using only the test input data, regardless of the expected output. Our empirical evaluation shows that MRTest can automatically detect these design faults within a reasonable time.
Aim This study examines perceptions of the implementation of National Council Licensing Examination in Canada through a content analysis of articles in the media. Background Public opinions of nursing in the media have been acknowledged as important for the profession, specifically in relation to their portrayal of nursing. Introduction The Canadian Council of Registered Nurse Regulators began using the US‐based National Council Licensing Examination as entry examination (also known widely as NCLEX) for Canada's registered nurses, discontinuing the previous Canadian Registered Nurse Examination in 2015. Methods A qualitative content analysis was conducted of media reports that emerged following adoption of the National Council Licensing Examination in Canada, and highlight the image of nursing portrayed in the media during this key regulatory policy change. Results Release of the examination results for the first three quarters of 2015 identified a much lower overall Canadian pass rate than with the previous exam. Media reports highlight differences in perception of the examination between Canadian regulators and other stakeholders in the context of the examination experiences reported and test results. Issues around applicability of the examination to Canadian nursing practice, curriculum alignment, language translation concerns and stakeholder engagement were identified. Discussion The implementation of the National Council Licensing Examination in Canada highlighted lack of communication among nursing stakeholders in the country. Conclusions Most of the media reporting has been negative and poses a reputational risk to the Canadian nursing profession. Implications for Nursing Policy This change in the licensing requirement has significant policy implications for nursing in Canada and globally. Issues such as appropriate examination translation, access to appropriate test preparation materials, assurance that the examination reflects distinctive aspects of a country's healthcare system and the need for stakeholder engagement were identified.
Summary Context MapReduce is a processing model used in Big Data to facilitate the analysis of large data under a distributed architecture. Objective The aim of this study is to identify and categorize the state of the art of software testing in MapReduce applications, determining trends and gaps. Method Systematic mapping study to discuss and classify according to international standards 54 relevant studies in relation to reasons for testing, types of testing, quality characteristics, test activities, tools, roles, processes, test levels, and research validations. Results The principal reasons for testing MapReduce applications are performance issues, potential failures, issues related to the data, or to satisfy the agreements with efficient resources. The efforts are focused on performance and, to a lesser degree, on functionality. Performance testing is carried out through simulation and evaluation, whereas functional testing considers some program characteristics (such as specification and structure). Despite the type of testing, the majority of efforts are focused at the unit and integration test levels of the specific MapReduce functions without considering other parts of the technology stack. Conclusions Researchers have both opportunities and challenges in performance and functional testing, and there is room to improve their research though the use of mature and standard validation methods.
MapReduce is a parallel data processing paradigm oriented to process large volumes of information in data-intensive applications, such as Big Data environments. A characteristic of these applications is that they can have different data sources and data formats. For these reasons, the inputs could contain some poor quality data that could produce a failure if the program functionality does not handle properly the variety of input data. The output of these programs is obtained from a number of input transformations that represent the program logic. This paper proposes the testing technique called MRFlow that is based on data flow test criteria and oriented to transformations analysis between the input and the output in order to detect defects in MapReduce programs. MRFlow is applied over some MapReduce programs and detects several defects.
Abstract-Programs that process a large volume of data generally run in a distributed and parallel architecture, such as the programs implemented in the processing model MapReduce. In these programs, developers can abstract the infrastructure where the program will run and focus on the functional issues. However, the infrastructure configuration and its state cause different parallel executions of the program and some could derive in functional faults which are hard to reveal. In general, the infrastructure that executes the program is not considered during the testing, because the tests usually contain few input data and then the parallelization is not necessary. In this paper a testing technique is proposed to generate different infrastructure configurations for a given test input data, and then the program is executed in these configurations in order to reveal functional faults. This testing technique is automatized by using a test engine and applied in a case study. As a result, several infrastructure configurations are automatically generated and executed for a test case revealing a functional fault that is then fixed by the developer.
Abstract-Big Data programs are those that process large data exceeding the capabilities of traditional technologies. Among newly proposed processing models, MapReduce stands out as it allows the analysis of schema-less data in large distributed environments with frequent infrastructure failures. Functional faults in MapReduce are hard to detect in a testing/preproduction environment due to its distributed characteristics. We propose an automatic test framework implementing a novel testing approach called Ex Vivo. The framework employs data from production but executes the tests in a laboratory to avoid side-effects on the application. Faults are detected automatically without human intervention by checking if the same data would generate different outputs with different infrastructure configurations. The framework (MrExist) is validated with a real-world program. MrExist can identify a fault in a few seconds, then the program can be stopped, not only avoiding an incorrect output, but also saving money, time and energy of production resources.
The implemented programs in the MapReduce
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.