Conjugate gradient sparse solvers: performance-power characteristics

Malkowski, Konrad; Lee, Ingyu; Raghavan, Padma; Irwin, M.J.

doi:10.1109/ipdps.2006.1639595

Cited by 7 publications

(14 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, using formula from (Jaleel et al, 2006, Hennessy andPatterson, 2003), the memory access time is represented as: is difficult in real computing environment. However, even with a simple memory prefetcher, the value is negligibly small in our algorithm since it accesses memory in sequential direction (Malkowski et al, 2005a;2005b).…”

Section: Multithreaded Iterative Solver: Mtcgmentioning

confidence: 99%

Analyzing Performance and Power of Multicore Architecture Using Multithreaded Iterative Solver

Lee¹

2010

Journal of Computer Science

View full text Add to dashboard Cite

Problem statement: Scientific modeling and simulations have been popularly used with experiments and theoretical analysis in science and engineering communities. Approach: Consequently, computational demands are growing exponentially to afford large scale modeling and simulations. Results: As a result, multicore computing architectures had been proposed and several products are already available. However, we do not have a proper study on the performance, power and thermal issues of real science and engineering problems because software, which takes advantage of multicore architecture, is not available. Conclusion/Recommendations: In this study, we explored the performance and power characteristics of scientific algorithms on multicore architectures using a multithreaded version of sparse iterative linear solver, named mtCG, with real scientific application problems.

show abstract

Section: Multithreaded Iterative Solver: Mtcgmentioning

confidence: 99%

Analyzing Performance and Power of Multicore Architecture Using Multithreaded Iterative Solver

Lee¹

2010

Journal of Computer Science

View full text Add to dashboard Cite

show abstract

“…The matrix properties including the name, dimension (×10 3 ), number of nonzeros (×10 6 ) and the percentage of nonzeroes relative to a dense matrix of the same dimension are: bcsstk31, 35.6, 1.2, .09%; fdm2, 32.1,.16, .01%; qa8fm, 66.1, 1.6, .03%; and msc23052, 23.0, 1.1, .21%. These matrices had been reordered using the Reverse Cuthill McKee [17] scheme to improve the locality of access in the source vector [12] as is commonly done for tuned scientific codes.…”

Section: Sparse Scientific Computing Applicationsmentioning

confidence: 99%

“…Additionally, prefetching techniques were discussed by Lin, et al [31]. Effects of prefetchers on performance and power of sparse applications were investigated by authors in [12].…”

Section: Related Researchmentioning

confidence: 99%

“…As chips approach their packaging thermal limits and cooling costs become prohibitive, power-aware design is starting to receive considerable attention in the high performance computing community [10], [11]. Sparse computations present interesting opportunities for power-aware high performance scientific computing [12] because, although they represent scalable formulations, they can achieve only a fraction of peak performance (despite extensive tuning). We conjecture that both performance and power improvements are possible by considering the co-evolution of architectural optimizations and tuned sparse matrix computations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Load Miss Prediction - Exploiting Power Performance Trade-offs

Malkowski

Link

Raghavan

et al. 2007

2007 IEEE International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

Abstract-Modern CPUs operate at GHz frequencies, but the latencies of memory accesses are still relatively large, in the order of hundreds of cycles. Deeper cache hierarchies with larger cache sizes can mask these latencies for codes with good data locality and reuse, such as structured dense matrix computations. However, cache hierarchies do not necessarily benefit sparse scientific computing codes, which tend to have limited data locality and reuse. We therefore propose a new memory architecture with a Load Miss Predictor (LMP), which includes a data bypass cache and a predictor table, to reduce access latencies by determining whether a load should bypass the main cache hierarchy and issue an early load to main memory. Our architecture uses the L2 (and lower caches) as a victim cache for data removed from our bypass cache. We use cycleaccurate simulations, with SimpleScalar and Wattch to show that our LMP improves the performance of sparse codes, our application domain of interest, on average by 14%, with a 13.6% increase in power. When the LMP is used with dynamic voltage and frequency scaling (DVFS), performance can be improved by 8.7% with system power savings of 7.3% and energy reduction of 17.3% at 1800MHz relative to the base system at 2000MHz. Alternatively our LMP can be used to improve the performance of SPEC benchmarks by an average of 2.9% at the cost of 7.1% increase in average power.

show abstract

“…We discuss how memory optimizations that we have developed earlier [11], [15], [16] can affect the performance of tuned and un-tuned versions of sparse matrix vector multiplication. We consider the use of such optimizations with powersaving modes of the hardware such as Dynamic Voltage and Frequency Scaling (DVFS) [5] to improve performance at significantly lower power levels.…”

Section: Introductionmentioning

confidence: 99%

Memory Optimizations For Fast Power-Aware Sparse Computations

Malkowski

Raghavan

Irwin

2007

2007 IEEE International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

Abstract-We consider memory subsystem optimizations for improving the performance of sparse scientific computation while reducing the power consumed by the CPU and memory. We first consider a sparse matrix vector multiplication kernel that is at the core of most sparse scientific codes, to evaluate the impact of prefetchers and power-saving modes of the CPU and caches. We show that performance can be improved at significantly lower power levels, leading to over a factor of five improvement in the operations/Joule metric of energy efficiency. We then indicate that these results extend to more complex codes such as a multigrid solver. We also determine a functional representation of the impacts of such optimizations and we indicate how it can be used toward further tuning. Our results thus indicate the potential for cross-layer tuning for multiobjective optimizations by considering both features of the application and the architecture.

show abstract

Conjugate gradient sparse solvers: performance-power characteristics

Cited by 7 publications

References 19 publications

Analyzing Performance and Power of Multicore Architecture Using Multithreaded Iterative Solver

Analyzing Performance and Power of Multicore Architecture Using Multithreaded Iterative Solver

Load Miss Prediction - Exploiting Power Performance Trade-offs

Memory Optimizations For Fast Power-Aware Sparse Computations

Contact Info

Product

Resources

About