Analyzing and processing massive volumes of data in different applications like sensor data, health care and e-Commerce require big data processing technologies. Extracting useful information from the enormous size of unstructured data is a crucial thing. As the amount of data becomes more extensive, sophisticated pre-processing techniques are required to analyze the data. In social networking sites and other online shopping sites, a massive volume of online product reviews from a large size of customers are available [1]. The impact of online product reviews affects 90% of the current e-Commerce market [2]. Customer reviews contribute the product sale to an extent and product life in the market depends on online product recommendations. Online feedback is one of the communication methods which gives direct suggestions from the customers [3, 4]. Online reviews and ratings from customers are another information source about product quality [5, 6]. Customer reviews can help to decide on a new successful product launch. Online shopping has several advantages over retail shopping. In retail shopping, the customers visit the shop and receive price information but less product
Pre-launch success prediction of a product is a challenge in today's electronic world. Based on this prediction, industries can avoid huge losses by deciding on whether to launch or not to launch a product into the market. We have implemented a Multithreaded Hash join Resilient Distributed Dataset (MHRDD) with a prediction classifier for pre-launch prediction. MHRDD helps to remove the redundancy in the input dataset and improves the performance of the prediction model. Large volume of e-Word of Mouth (e-WOM) data like product reviews, comments and ratings available on internet about products can be used for pre-launch product prediction. In MHRDD, to identify features a distance similarity score is used. In order to remove duplicates, a hash key and join operations are used to create a hash table of significant features. With in-memory computations and hashing on the join operations, this model reduces redundancy of data. This model is scalable and can handle large datasets with good prediction accuracy. This paper presents a novel big data processing method that predicts product success before its launch in the market. Proposed method helps to identify features that are significant for the product to be successful. Based on the pre-launch prediction, companies can reduce cost, effort and time with improved product success.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.