Reliably measuring hydrogen uptake in storage materials

Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. In this paper, we address the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies. We compare the levels of optimization obtained by iterative compilation to several well-known static techniques and show that we outpedorm each of them on a range of benchmarks across a variety of architectures. Finally, we show how to quantitatively trade-off the number of profiles needed and the level of optimization that can be reached.

show abstract

A feasibility study in iterative compilation

Kisuki

Knijnenburg

O’Boyle

et al. 1999

View full text Add to dashboard Cite

The effect of cache models on iterative compilation for combined tiling and unrolling

Knijnenburg

Kisuki

Gallivan

et al. 2004

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYIterative compilation, where we search for the best program transformation based on profile feedback information, has been highly successful in determining good program optimizations, typically outperforming all static approaches. However, this benefit has come at a price, namely, a large number of executions of the target program which in many cases is unacceptable. This paper is concerned with reducing the number of program executions needed by iterative compilation by incorporating static models. We examine how static models may be incorporated into a purely iterative approach by developing several characterized heuristics. We empirically explore the tradeoff between reducing the number of executions required and the ultimate performance achieved for differing parameters or slack factors. We focus on tiling and unrolling as transformations and cache models as a test case for the feasibility of this approach. We show that a highly accurate model, used as a filter to profiling and appropriately parameterized, can reduce the number of executions by 50%. We also show that using a simple model to rank transformations and profiling, only those with the highest ranking can reduce the number of executions even further up to 70%, in cases where there is only a limited number of profiles available. We conclude that a production compiler might perform best using the last approach, gaining the benefit of iterative compilation at a much reduced cost.

show abstract

Iterative Compilation

Knijnenburg¹,

Kisuki²,

O’Boyle³

2002

View full text Add to dashboard Cite

Shared vs. snoop: Evaluation of cache structure for single-chip multiprocessors

Kisuki

Wakabayashi

Yamamoto

et al. 1997

View full text Add to dashboard Cite

The shared cache structures and snoop cache structures for single-chip multiprocessors are evaluated and compared using an instruction level simulator. Simulation results show that 1-port large shared cache achieves the best performance if there is no delay penalty for arbitration and accessing the bus. However, if 1-clock delay is assumed for accessing the shared cache, a snoop cache with internal wide bus and invalidate style NewKeio protocol overcomes shared caches.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Toru Kisuki

Combined selection of tile sizes and unroll factors using iterative compilation

A feasibility study in iterative compilation

The effect of cache models on iterative compilation for combined tiling and unrolling

Iterative Compilation

Shared vs. snoop: Evaluation of cache structure for single-chip multiprocessors

Contact Info

Product

Resources

About