No abstract
Abstract. Social Graph Analytics has become very popular these days, with companies like Zynga, Linkedin, and Facebook seeking to derive the most value from their respective social networks. It is common belief that relational databases are ill-equipped to deal with graph problems, resulting in the use of MapReduce implementations or special purpose graph analysis engines. We challenge this belief by presenting a few use-cases that Vertica has very successfully solved with simple SQL over a high-performance relational database engine.Keywords: Analytics, Graphs, Data Mining, Vertica, Social Networks, MapReduce, Hadoop, Pig, Influencers, K-core. IntroductionSocial networks have become a central feature of our online lives, both as consumers and enterprises. From Farmville to Linkedin to Groupon, many new businesses revolve around knowing your friends and leveraging their collective knowledge, behaviors, opinions and buying power. Indeed, for any data-driven enterprise seeking to provide a personalized and relevant customer experience, it is now no longer just web analytics -effective real-time social network analytics can reap significant rewards. The heart of social network analytics revolves around solving graph problems on large volumes of data at scale and with high performance. It is a common misconception that relational databases are ill-equipped to deal with graph problems, resulting in the use of custom coded implementations or special purpose graph analysis engines. We challenge this belief by presenting two use-cases that Vertica has very successfully solved with simple SQL over a high-performance relational database engine. The first use-case is to find the influencers in a social graph and to show how it can be used to do A/B testing of products. The second use-case is to solve the problem of counting triangles in a graph and comparing the solutions written in SQL, Hadoop/MapReduce and Pig.
Facebook's graph store TAO, like many other distributed data stores, traditionally prioritizes availability, efficiency, and scalability over strong consistency or isolation guarantees to serve its large, read-dominant workloads. As product developers build diverse applications on top of this system, they increasingly seek transactional semantics. However, providing advanced features for select applications while preserving the system's overall reliability and performance is a continual challenge. In this paper, we first characterize developer desires for transactions that have emerged over the years and describe the current failure-atomic (i.e., write) transactions offered by TAO. We then explore how to introduce an intuitive read transaction API. We highlight the need for atomic visibility guarantees in this API with a measurement study on potential anomalies that occur without stronger isolation for reads. Our analysis shows that 1 in 1,500 batched reads reflects partial transactional updates, which complicate the developer experience and lead to unexpected results. In response to our findings, we present the RAMP-TAO protocol, a variation based on the Read Atomic Multi-Partition (RAMP) protocols that can be feasibly deployed in production with minimal overhead while ensuring atomic visibility for a read-optimized workload at scale.
The continued emergence of large social network applications has introduced a scale of data and query volume that challenges the limits of existing data stores. However, few benchmarks accurately simulate these request patterns, leaving researchers in short supply of tools to evaluate and improve upon these systems. In this paper, we present a new benchmark, TAOBench, that captures the social graph workload at Meta. We open source workload configurations along with a benchmark that leverages these request features to both accurately model production workloads and generate emergent application behavior. We ensure the integrity of TAOBench's workloads by validating them against their production counterparts. We also describe several benchmark use cases at Meta and report results for five popular distributed database systems to demonstrate the benefits of using TAOBench to evaluate system tradeoffs as well as identify and address performance issues. Our benchmark fills a gap in the available tools and data that researchers and developers have to inform system design decisions.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.