This paper compares the performance of an SQL solution that implements a relational data model with a document store named MongoDB. We report on the performance of a single node configuration of each data store and assume the database is small enough to fit in main memory. We analyze utilization of the CPU cores and the network bandwidth to compare the two data stores. Our key findings are as follows. First, for those social networking actions that read and write a small amount of data, the join operator of the SQL solution is not slower than the JSON representation of MongoDB. Second, with a mix of actions, the SQL solution provides either the same performance as MongoDB or outperforms it by 20%. Third, a middle-tier cache enhances the performance of both data stores as query result look up is significantly faster than query processing with either system.
A IntroductionThere is an abundance of data stores with both the computer industry and the research arena contributing novel architectures and data models. In [10], Cattell surveys and classifies 22 data stores to motivate a quantitative analysis of the alternative designs and implementations. We study a specific aspect of this vast multi-faceted topic, namely, a comparison of an industrial strength relational database management system (RDBMS) named 1 SQL-X and a NoSQL document store named MongoDB. While SQL-X implements a relational data model [12], MongoDB implements a £ A shorter version of this paper appeared in the ACM International Conference on Information and Knowledge Management (CIKM), San Francisco, CA, Oct 2013.1 Due to licensing agreement, we cannot disclose the identity of this system.
1JSON representation of data [14]. Each offers a rich set of design choices. We use the BG [5] benchmark to exercise the different capabilities of each data store. This social networking benchmark consists of a database and eleven actions (see Table 1) that either read or write a small amount of data from the database.While SQL-X does not scale horizontally, MongoDB scales to a large number of nodes. In addition to impacting the performance of a single node instance of each data store, physical organization of data impacts the horizontal scalability of MongoDB. While both are important, we focus on the performance of a single node instance of each data store for the following reasons. First, it provides insights into the tradeoffs associated with two alternative logical data designs, namely, relational and JSON. An interesting finding is that the use of the join operator is not slower than the JSON representation, see Section D.Second, while BG's interactive social networking actions are simple, they interact in complex ways to offer a wide range of design choices. We show it is beneficial to move the work of read actions to write actions when the workload is dominated by read actions. (According to Facebook, more than 99% of their workload is dominated by queries [3,28].) Materialized views are not appropriate because they provide either a very low perfor...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.