Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models

Vishnu, Abhinav; Song, Shuaiwen Leon; Márquez, Andrés; Barker, Kevin; Kerbyson, Darren J.; Cameron, Kirk W.; Balaji, Pavan

doi:10.1109/greencom-cpscom.2010.133

Cited by 19 publications

(5 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The approaches that target the Message Passing Interface (MPI) applications mainly involve mitigation of workload imbalance between the process (slack) [4,16,29,41]. Other MPI-centric solutions address cases where the processor cores wait on the memory or network [5,22,26,43,48,49]. Concurrency throttling has been widely used by adapting the thread count in OpenMP programs that are memory-constrained to reduce power consumption [12,13,31,37].…”

Section: Related Workmentioning

confidence: 99%

Cuttlefish: Library for Achieving Energy Efficiency in Multicore Parallel Programs

Kumar¹,

Gupta²,

Kumar³

et al. 2021

Preprint

View full text Add to dashboard Cite

A low-cap power budget is challenging for exascale computing. Dynamic Voltage and Frequency Scaling (DVFS) and Uncore Frequency Scaling (UFS) are the two widely used techniques for limiting the HPC application's energy footprint. However, existing approaches fail to provide a unified solution that can work with different types of parallel programming models and applications.This paper proposes Cuttlefish, a programming model oblivious C/C++ library for achieving energy efficiency in multicore parallel programs running over Intel processors. An online profiler periodically profiles modelspecific registers to discover a running application's memory access pattern. Using a combination of DVFS and UFS, Cuttlefish then dynamically adapts the processor's core and uncore frequencies, thereby improving its energy efficiency. The evaluation on a 20-core Intel Xeon processor using a set of widely used OpenMP benchmarks, consisting of several irregular-tasking and work-sharing pragmas, achieves geometric mean energy savings of 19.4% with a 3.6% slowdown. CCS Concepts: • Software and its engineering → Power management.

show abstract

Section: Related Workmentioning

confidence: 99%

Cuttlefish: Library for Achieving Energy Efficiency in Multicore Parallel Programs

Kumar¹,

Gupta²,

Kumar³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The more sophisticated ones scale processor frequency on different intervals of application runtime while attempting to predict accurately the performance effects from the DVFS. Such approaches may be broadly classified into two types: One that first divides the application into execution intervals of predefined duration and then uses the performance counters to determine a suitable frequency for them [7,10,11]; and the other that first determines communication intervals in parallel applications that use either explicit message passing [6,15,22,23] or global address-space primitives [24] and then scales the frequency for those intervals, usually based on the variation of the MIPS (million instructions per second) metric at different P-states. Typically these approaches first choose a (often user-defined) performance loss (PL) tolerance for the application and then try to maximize energy savings under this PL as constraint.…”

Section: Employing Dvfsmentioning

confidence: 99%

Runtime power-aware energy-saving scheme for parallel applications

Sundriyal

Sosonkina

2017

IJHPSA

View full text Add to dashboard Cite

Energy consumption has become a major design constraint in modern computing systems. With the advent of peta ops architectures, power efficient software stacks have become imperative for scalability. Modern processors provide techniques, such as dynamic voltage and frequency scaling (DVFS), to improve energy efficiency on-the-fly. Without careful application, however, DVFS and throttling may cause significant performance loss due to the system overhead. Typically, these techniques are used by constraining a priori the application performance loss, under which the energy savings are sought. This paper discusses potential drawbacks of such usage and proposes an energy-saving scheme that takes into account the instantaneous processor power consumption as presented by the running average power limit" (RAPL) technology from Intel. Thus, the need for the user to define a performance loss tolerance apriori is avoided. Experiments, performed on NAS benchmarks, show that the proposed scheme saves more energy than the approaches based on the pre-defined performance loss.

show abstract

“…The other approaches primarily focus on scaling processor frequency during slack or communication operations during application runtime. The techniques in the past have targeted communication intervals in parallel applications that use either explicit message passing Lowenthal 2005, Lim, Freeh, andLowenthal 2006) or global address-space primitives (Vishnu, Song, Marquez, Barker, Kerbyson, Cameron, and Balaji 2010) and then scales the frequency for those intervals. Oversubscribing the processor cores (Iancu, Hofmeyr, Blagojevic, and Zheng 2010) is another technique which can be used to reduce execution time and lower power consumption of a parallel application.…”

Section: Related Workmentioning

confidence: 99%

Evaluating Effects of Application Based and Automatic Energy Saving Strategies On NWChem

2017

25th High Performance Computing Symposium (HPC 2017)

View full text Add to dashboard Cite

High-performance application developers are becoming increasingly aware of effects of the increasing energy consumption on the costs and reliability of modern computing systems. A traditional way to achieve energy savings is by changing the processor frequency dynamically during application execution. Several techniques have been proposed in the past at application, library, and transparent level. In this work, the effect of two such techniques, at application and transparent levels, are evaluated in terms of their effects on the execution time and energy consumption for different algorithms in the quantum chemistry package NWChem. Experimental results depict that there is no clear winner between the two methods since the transparent-level makes decisions without intimate knowledge of the application while the strategy based solely on application does not take into the account the platform characteristics at the runtime. Hence, it is argued that the best strategy would be a hybrid of the two levels.

show abstract

Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models

Cited by 19 publications

References 37 publications

Cuttlefish: Library for Achieving Energy Efficiency in Multicore Parallel Programs

Cuttlefish: Library for Achieving Energy Efficiency in Multicore Parallel Programs

Runtime power-aware energy-saving scheme for parallel applications

Evaluating Effects of Application Based and Automatic Energy Saving Strategies On NWChem

Contact Info

Product

Resources

About