Kamer Kaya scite author profile

Abstract-The scheduling of independent but file-sharing tasks on heterogeneous master-slave platforms has recently found important applications in Grid environments. The scheduling heuristics recently proposed for this problem are all constructive in nature and based on a common greedy criterion which depends on the momentary completion time values of the tasks. We show that this greedy decision criterion has shortcomings in exploiting the file-sharing interaction among tasks since completion time values are inadequate to extract the global view of this interaction. We propose a three-phase scheduling approach which involves initial task assignment, refinement, and execution ordering phases. For the refinement phase, we model the target application as a hypergraph and, with an elegant hypergraph-partitioning-like formulation, we propose using iterative-improvement-based heuristics for refining the task assignments according to two novel objective functions. Unlike the turnaround time, which is the actual schedule cost, the smoothness of proposed objective functions enables the use of iterative-improvement-based heuristics successfully since their effectiveness and efficiency depend on the smoothness of the objective function. Experimental results on a wide range of synthetically generated heterogeneous master-slave frameworks show that the proposed three-phase scheduling approach performs much better than the greedy constructive approach.

show abstract

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Saule

Kaya

Çatalyürek

2014

View full text Add to dashboard Cite

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many scientific applications involve operations on large sparse matrices such as linear solvers, eigensolver, and graph mining algorithms. The core of most of these applications involves the multiplication of a large, sparse matrix with a dense vector (SpMV). In this paper, we investigate the performance of the Xeon Phi coprocessor for SpMV. We first provide a comprehensive introduction to this new architecture and analyze its peak performance with a number of micro benchmarks. Although the design of a Xeon Phi core is not much different than those of the cores in modern processors, its large number of cores and hyperthreading capability allow many application to saturate the available memory bandwidth, which is not the case for many cutting-edge processors. Yet, our performance studies show that it is the memory latency not the bandwidth which creates a bottleneck for SpMV on this architecture. Finally, our experiments show that Xeon Phi's sparse kernel performance is very promising and even better than that of cutting-edge general purpose processors and GPUs.Xeon Phi has been released recently, performance evaluations already exist in literature [6,15,16]. Eisenlohr et al. investigated the behavior of dense linear algebra factorization on Xeon Phi [6] and Stock et al. proposed an automatic code optimization approach for tensor contraction kernels [16]. Both of these works work on dense and regular data. in a previous work, two of the authors evaluates the scalability of graph algorithms, coloring and breadth first search (BFS) [15]. None of these works give absolute performance values and to the best of our knowledge there exist no such work in the literature. Although similar to BFS, SpMV and SpMM are different kernels (in terms of synchronization, memory access, and load balancing), and as we will show, the new coprocessor is very promising and can even perform better than existing cutting-edge CPUs and accelerators while handling sparse linear algebra.Accelerators/coprocessors are designed for specific tasks. They do not only achieve a good performance but also reduce the energy usage per computation. Up to now, GPUs have been successful w.r.t. these criteria and they reported to perform well especially for regular computations. However, the irregularity and sparsity of SpMV-like kernels create several problems for these architectures. In this paper, we analyze how Xeon Phi performs on two popular sparse linear algebra kernels, SpMV and SpMM.[4] studied the performance of a Conjugate Gradient application which uses SpMV, however this study concerns only a single matrix and is application oriented. To the best of our knowledge, we give the first analysis of the performance of the coprocessor on these kernels.We conduct several experiments with 22 matrices from UFL Sparse Matrix Co...

show abstract

Design, implementation, and analysis of maximum transversal algorithms

Duff¹,

Kaya

Uçcar

2011

ACM Trans. Math. Softw.

View full text Add to dashboard Cite

We report on careful implementations of seven algorithms for solving the problem of finding a maximum transversal of a sparse matrix. We analyse the algorithms and discuss the design choices. To the best of our knowledge, this is the most comprehensive comparison of maximum transversal algorithms based on augmenting paths. Previous papers with the same objective either do not have all the algorithms discussed in this paper or they used non-uniform implementations from different researchers. We use a common base to implement all of the algorithms and compare their relative performance on a wide range of graphs and matrices. We systematize, develop and use several ideas for enhancing performance. One of these ideas improves the performance of one of the existing algorithms in most cases, sometimes significantly. So much so that we use this as the eighth algorithm in comparisons.

show abstract

Betweenness centrality on GPUs and heterogeneous architectures

Sarıyüce

Kaya

Saule

et al. 2013

View full text Add to dashboard Cite

The betweenness centrality metric has always been intriguing for graph analyses and used in various applications. Yet, it is one of the most computationally expensive kernels in graph mining. In this work, we investigate a set of techniques to make the betweenness centrality computations faster on GPUs as well as on heterogeneous CPU/GPU architectures. Our techniques are based on virtualization of the vertices with high degree, strided access to adjacency lists, removal of the vertices with degree 1, and graph ordering. By combining these techniques within a fine-grain parallelism, we reduced the computation time on GPUs significantly for a set of social networks. On CPUs, which can usually have access to a large amount of memory, we used a coarse-grain parallelism. We showed that heterogeneous computing, i.e., using both architectures at the same time, is a promising solution for betweenness centrality. Experimental results show that the proposed techniques can be a great arsenal to reduce the centrality computation time for networks. In particular, it reduces the computation time of a 234 million edges graph from more than 4 months to less than 12 days.

show abstract

Task assignment in heterogeneous computing systems

Uçar

Aykanat

Kaya

et al. 2006

Journal of Parallel and Distributed Computing

116

View full text Add to dashboard Cite

The problem of task assignment in heterogeneous computing systems has been studied for many years with many variations. We consider the version in which communicating tasks are to be assigned to heterogeneous processors with identical communication links to minimize the sum of the total execution and communication costs. Our contributions are three fold: a task clustering method which takes the execution times of the tasks into account; two metrics to determine the order in which tasks are assigned to the processors; a refinement heuristic which improves a given assignment. We use these three methods to obtain a family of task assignment algorithms including multilevel ones that apply clustering and refinement heuristics repeatedly. We have implemented eight existing algorithms to test the proposed methods. Our refinement algorithm improves the solutions of the existing algorithms by up to 15% and the proposed algorithms obtain better solutions than these refined solutions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kamer Kaya

Iterative-Improvement-Based Heuristics for Adaptive Scheduling of Tasks Sharing Files on Heterogeneous Master-Slave Environments

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Design, implementation, and analysis of maximum transversal algorithms

Betweenness centrality on GPUs and heterogeneous architectures

Task assignment in heterogeneous computing systems

Contact Info

Product

Resources

About