According to the National Security Agency, the Internet processes 1826 petabytes (PB) of data per day [1]. In 2018, the amount of data produced every day was 2.5 quintillion bytes [2]. Previously, the International Data Corporation (IDC) estimated that the amount of generated data will double every 2 years [3], however 90% of all data in the world was generated over the last 2 years, and moreover Google now processes more than 40,000 searches every second or 3.5 billion searches per day [2]. Facebook users upload 300 million photos, 510,000 comments, and 293,000 status updates per day [2, 4]. Needless to say, the amount of data generated on a daily basis is staggering. As a result, techniques are required to analyze and understand this massive amount of data, as it is a great source from which to derive useful information.
Massive datasets are quickly becoming a concern for many industries. For example, many web-based applications must be able to handle petabytes worth of transactions on a daily basis, and moreover, be able to quickly and efficiently act upon data that exists in each transaction. As a result, providing testing capabilities for such applications becomes a challenge of scale. We argue that existing approaches, such as automated test suite generation, may not necessarily scale without assistance. To this end, we discuss open issues and possible solutions specific to testing big data applications. CCS Concepts•Software and its engineering → Software testing and debugging; Search-based software engineering; Software system structures;Keywords big data, search-based software testing, test suite generation OVERVIEWMany techniques are currently being developed for generating datasets of massive scale (i.e., big data) for use in validating applications [1]. However, there is little published research in performing testing on applications that already interact with big data [9]. Moreover, even fewer publications explore how search-based software testing (SBST) techniques can be used to optimize testing strategies [6,8]. As such, research needs to be performed in testing big data applications to determine both the feasibility and applicability of existing testing techniques to such applications. For example, consider a nationwide healthcare network that centralizes medical records for all patients. Such a system can deals with an enormous amount of data as well as an amalgam of heterogeneous systems and devices. This system can enable a patient to visit their primary care physician, receive a prescription for treatment with a specialist in another state, and then enable that specialist to instantly retrieve the entirety of the patient's medical history. As such, specialized applications will require development to handle the dataset, including optimizations for querying and retrieving specific data. However, such applications may not be effectively tested by existing strategies, given the wide range of values that may manifest. As such, this position paper specifically argues for an examination on how big data can impact existing testing strategies, focusing on automated test suite generation.Traditionally, software testing has been considered an ideal field for application of search-based heuristics, such as genetic algorithms [7]. Notable systems include EvoSuite [5] and Nighthawk [2] for automated generation of test suites and instantiation of unit tests, respectively. Given the optimization problems that typically comprise a software testing strategy (e.g., test suite generation, test case prioritization and selection, etc.), search-based heuristics have been shown to quickly and efficiently come to an optimal solution. However, many industries are moving towards the big data paradigm, where petabytes of data must be considered at run time. As such, a strategy such as test suite generation may be cost-prohibitive, given t...
Alzheimer's disease is a progressive illness that affects more than 5.5 million people in the United States with no effective cure or treatment. Symptoms of the disease include declines in memory and speech abilities and increases in aggression and insomnia. Recent research suggests that NLP techniques can detect early cognitive decline as well as monitor the rate of decline over time. The processed data can be used in a smart home environment to enhance the level of home care for Alzheimer's patients. This paper proposes early-stage research in software engineering and natural language processing for quantifying and evaluating the patient's cognitive state to determine the required level of support in a smart home.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.