<p>In the era of rapid growth of cloud computing, performance calculation of cloud service is an essential criterion to assure quality of service. Nevertheless, it is a perplexing task to effectively analyze the performance of cloud service due to the complexity of cloud resources and the diversity of Big Data applications. Hence, we propose to examine the performance of Big Data applications with Hadoop and thus to figure out the performance in cloud cluster. Hadoop is built based on MapReduce, one of the widely used programming models in Big Data. In this paper, the performance analysis of Hadoop MapReduce WordCount application for Twitter data is presented. A 4-node in-house Hadoop cluster was setup and experiment was carried out for analyzing the performance. Through this work, it was concluded that Hadoop is efficient for BigData applications with 3 or more nodes with replication factor 3. Also, it was observed that system time was relatively more compared to user time for BigData applications beyond 80GB. This experiment had also thrown certain pattern on actual data blocks used to process the WordCount application. </p>
The dynamic behavior of distributed systems requires that their performance characteristics be determined rigorously, preferably in the early stages of software engineering process. Evaluation of the performance at the end of software development leads to increase in the cost of design change. To compare design alternatives or to identify system bottlenecks, quantitative system analysis must be carried out from the early stages of the software development life cycle. In this paper we describe a process model, Hybrid Performance Prediction Process Model that allows modeling and evaluating distributed systems with the explicit goal of assessing performance of the software system during feasibility study. The use case performance engineering approach proposed in this paper exploits use case model and provides flexibility to integrate the software performance prediction process with software engineering process. We use an e-parking application to demonstrate various elements in our framework. The performance metrics are obtained and analyzed by considering two software architectures. Sensitivity analysis on the behavior of resources is carried out. This analysis helps to determine the capacity of the execution environment to obtain the defined performance objectives.
Standalone systems cannot handle the giant traffic loads generated by Twitter due to memory constraints. A parallel computational environment provided by Apache Hadoop can distribute and process the data over different destination systems. In this paper, the Hadoop cluster with four nodes integrated with RHadoop, Flume, and Hive is created to analyze the tweets gathered from the Twitter stream. Twitter stream data is collected relevant to an event/topic like IPL-2015, cricket, Royal Challengers Bangalore, Kohli, Modi, from May 24 to 30, 2016 using Flume. Hive is used as a data warehouse to store the streamed tweets. Twitter analytics like maximum number of tweets by users, the average number of followers, and maximum number of friends are obtained using Hive. The network graph is constructed with the user's unique screen name and mentions using 'R'. A timeline graph of individual users is generated using 'R'. Also, the proposed solution analyses the emotions of cricket fans by classifying their Twitter messages into appropriate emotional categories using the optimized support vector neural network (OSVNN) classification model. To attain better classification accuracy, the performance of SVNN is enhanced using a chimp optimization algorithm (ChOA). Extracting the users' emotions toward an event is beneficial for prediction, but when coupled with visualizations, it becomes more powerful. Bar-chart and wordcloud are generated to visualize the emotional analysis results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.