Hadoop is an open source Apache project and a software framework for distributed processing of large datasets across large clusters of computers with commodity hardware. Large datasets include terabytes or petabytes of data where as large clusters means hundreds or thousands of nodes. It supports master slave architecture, which involves one master node and thousands of slave nodes. NameNode acts as the master node which stores all the metadata of files and various data nodes are slave nodes which stores all the application data. It becomes a bottleneck, when there is a need to process numerous number of small files because the NameNode utilizes the more memory to store the metadata of files and data nodes consume more CPU time to process numerous number of small files. This paper presents a novel technique to handle small file problems with Hadoop technology based on file merging, caching and correlation strategies. The experimental results shows that the proposed technique reduces the amount of data storage at NameNode, average memory usage of DataNodes and improves the access efficiency of small files in Hadoop Distributed File System up to 88.57% as compared with the general solution Hadoop Archive.
General TermsBig Data Analytics, Small files in Hadoop.
KeywordsHadoop, HDFS, Map Reduce, small files in Hadoop, small files storage in Hadoop.
Ideas and opinions of peoples are influenced by the opinions of other peoples. Lot of research is going on analysis of reviews given by peoples. Sentiment analysis is the major computational technique to calculate or observe sentiments of people's thoughts. Therefore, a method that assigns scores indicating positive and negative opinion about the product is proposed. It uses Hadoop Distributed File System (HDFS) to store data set and run on MapReduce architecture for performing sentiment analysis.
General TermsSentiment Analysis, hadoop streaming.
The present paper proposes a new and significant method of optimization for digital image watermarking by using a combination of Genetic Algorithms (GA), Histogram and Butterworth filtering. In this proposed method, the histogram range selection of low frequency components is taken as a significant parameter which assists in bettering the imperceptibility and robustness against attacks. The tradeoff between the perceptual transparency and robustness is considered as an optimization puzzle which is solved with the help of Genetic Algorithm. As a result, the experimental outcomes of the present approach are obtained. These results are secure and robust to various attacks such as rotation, cropping, scaling, additive noise and filtering attacks. The peak signal to noise ratio (PSNR) and Normalized cross correlation (NC) are carefully analyzed and assessed for a set of images and MATLAB2016B software is employed as a means of accomplishing or achieving these experimental results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.