Charles E. Leiserson scite author profile

Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation' s work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.The Cilk runtime system currently runs on the Connection Machine CM5 MPP, the Intel Paragon MPP, the Sun Sparcstation SMP, and the Cilk-NOW network of workstations. Applications written in Cilk include protein folding, graphic rendering, backtrack search, and the ?Socrates chess program, which won second prize in the 1995 World Computer Chess Championship.

show abstract

Cache-oblivious algorithms

Frigo¹,

Leiserson²,

Prokop³

et al.

526

728

View full text Add to dashboard Cite

Retiming synchronous circuitry

Leiserson

Saxe²

1991

Algorithmica

819

711

View full text Add to dashboard Cite

Abstract. This paper describes a circuit transformation called retimin 9 in which registers are added at some points in a circuit and removed from others in such a way that the functional behavior of the circuit as a whole is preserved. We show that retiming can be used to transform a given synchronous circuit into a more efficient circuit under a variety of different cost criteria. We model a circuit as a graph in which the vertex set Visa collection of combinational logic elements and the edge set E is the set of interconnections, each of which may pass through zero or more registers. We give an 0(I VI IEI lgl VI) algorithm for determining an equivalent retimed circuit with the smallest possible clock period. We show that the problem of determining an equivalent retimed circuit with minimum state (total number of registers) is polynomial-time solvable. This result yields a polynomial-time optimal solution to the problem of pipelining combinational circuitry with minimum register cost. We also give a chacterization of optimal retiming based on an efficiently solvable mixed-integer linear-programming problem.

show abstract

Scheduling multithreaded computations by work stealing

Blumofe

Leiserson²

1999

J. ACM

1,014

594

View full text Add to dashboard Cite

This paper studies the problem of efficiently schedulling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is “work stealing,” in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically, our analysis shows that the expected time to execute a fully strict computation on P processors using our work-stealing scheduler is T 1 / P + O ( T ∞ , where T 1 is the minimum serial execution time of the multithreaded computation and ( T ∞ is the minimum execution time with an infinite number of processors. Moreover, the space required by the execution is at most S 1 P , where S 1 is the minimum serial space requirement. We also show that the expected total communication of the algorithm is at most O ( PT ∞ ( 1 + n d ) S max ), where S max is the size of the largest activation record of any thread and n d is the maximum number of times that any thread synchronizes with its parent. This communication bound justifies the folk wisdom that work-stealing schedulers are more communication efficient than their work-sharing counterparts. All three of these bounds are existentially optimal to within a constant factor.

show abstract

The implementation of the Cilk-5 multithreaded language

Frigo¹,

Leiserson²,

Randall³

1998

SIGPLAN Not.

487

575

View full text Add to dashboard Cite

The fth release of the multithreaded language Cilk uses a provably good \work-stealing" scheduling algorithm similar to the rst system, but the language has been completely redesigned and the runtime system completely reengineered. The e ciency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this \work-rst" principle has led to a portable Cilk-5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the work-rst principle was exploited in the design of Cilk-5's compiler and its runtime system. In particular, we present Cilk-5's novel \two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler.

show abstract

Fat-trees: Universal networks for hardware-efficient supercomputing

Leiserson

1985

IEEE Trans. Comput.

1,053

438

View full text Add to dashboard Cite

EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs

Pareja

Domeniconi²,

Chen

et al. 2020

AAAI

624

380

View full text Add to dashboard Cite

Graph representation learning resurges as a trending research subject owing to the widespread use of deep learning for Euclidean data, which inspire various creative designs of neural networks in the non-Euclidean domain, particularly graphs. With the success of these graph neural networks (GNN) in the static setting, we approach further practical scenarios where the graph dynamically evolves. Existing approaches typically resort to node embeddings and use a recurrent neural network (RNN, broadly speaking) to regulate the embeddings and learn the temporal dynamics. These methods require the knowledge of a node in the full time span (including both training and testing) and are less applicable to the frequent change of the node set. In some extreme scenarios, the node sets at different time steps may completely differ. To resolve this challenge, we propose EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings. The proposed approach captures the dynamism of the graph sequence through using an RNN to evolve the GCN parameters. Two architectures are considered for the parameter evolution. We evaluate the proposed approach on tasks including link prediction, edge classification, and node classification. The experimental results indicate a generally higher performance of EvolveGCN compared with related approaches. The code is available at https://github.com/IBM/EvolveGCN.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.