The superior I/O performance of solid-state storage (e.g., solid-state drives) makes it become an attractive replacement for the traditional magnetic storage (e.g., harddisk drives). More and more storage systems start to integrate solid-state storage into their architecture. To understand the impacts of solid-state storage on the performance of Hadoop applications, we consider a hybrid Hadoop storage system consisting of both HDDs and SSDs, and conduct a series of experiments to evaluate the Hadoop performance under various system configurations. We find that the Hadoop performance can be increased almost linearly with the increasing fraction of SSDs in the storage system. The improvement is more significant for a larger dataset size. In addition, the performance of Hadoop applications running on SSD-dominant storage systems is insensitive to the variations of block size and buffer size, which significantly differs from HDD-dominant storage systems. By increasing the fraction of SSDs, there is no need for the Hadoop operators to consider how to carefully tune block size and buffer size to achieve the optimal performance. Our findings also indicate that the upgrade of the hadoop storage system can be achieved by increasing the capacity of SSDs linearly according to the scale of the applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.