Large, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-tomemory access ratio. Additionally, most real-world graphs have a low diameter and a highly heterogeneous node degree distribution. Partitioning these graphs and simultaneously achieve access locality and load-balancing is difficult if not impossible. This paper demonstrates the feasibility of graph processing on heterogeneous (i.e., including both CPUs and GPUs) platforms as a cost-effective approach towards addressing the graph processing challenges above. To this end, this work (i) presents and evaluates a performance model that estimates the achievable performance on heterogeneous platforms; (ii) introduces TOTEM -a processing engine based on the Bulk Synchronous Parallel (BSP) model that offers a convenient environment to simplify the implementation of graph algorithms on heterogeneous platforms; and, (iii) demonstrates TOTEM'S efficiency by implementing and evaluating two graph algorithms (PageRank and breadth-first search). TOTEM achieves speedups close to the model's prediction, and applies a number of optimizations that enable linear speedups with respect to the share of the graph offloaded for processing to accelerators.
This paper investigates the power, energy, and performance characteristics of large-scale graph processing on hybrid (i.e., CPU and GPU) single-node systems. Graph processing can be accelerated on hybrid systems by properly mapping the graphlayout to processing units, such that the algorithmic tasks exercise each of the units where they perform best. However, the GPUs have much higher Thermal Design Power (TDP), thus their impact on the overall energy consumption is unclear. Our evaluation using large real-world graphs and synthetic graphs as large as 1 billion vertices and 16 billion edges shows that a hybrid system is efficient in terms of both time-to-solution and energy.
In tagging systems users can annotate items of interest with freeform terms. A good understanding of usage characteristics of such systems is necessary to improve the design of current and next generation of tagging systems. To this end, this work explores three aspects of user behavior in CiteULike and Connotea, two systems that include tagging features to support online personalized management of scientific publications. First, this study characterizes the degree to which users re-tag previously published items and reuse tags: 10 to 20% of the daily activity can be characterized as re-tagging and about 75% of the activity as tag reuse. Second, we use the pairwise similarity between users' activity to characterize the interest sharing in the system. We present the interest sharing distribution across the system, show that this metric encodes information about existing usage patterns, and attempt to correlate interest sharing levels to indicators of collaboration such as co-membership in discussion groups and semantic similarity of tag vocabularies. Finally, we show that interest sharing leads to an implicit structure that exhibit a natural segmentation. Throughout the paper we discuss the potential impact of our findings on the design of mechanisms that support tagging systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.