Panruo Wu scite author profile

Abstract-In a variety of research areas, the weighted bag of vectors and the histogram are widely used descriptors for complex objects. Both can be expressed as discrete distributions. D2-clustering pursues the minimum total within-cluster variation for a set of discrete distributions subject to the KantorovichWasserstein metric. D2-clustering has a severe scalability issue, the bottleneck being the computation of a centroid distribution, called Wasserstein barycenter, that minimizes its sum of squared distances to the cluster members. In this paper, we develop a modified Bregman ADMM approach for computing the approximate discrete Wasserstein barycenter of large clusters. In the case when the support points of the barycenters are unknown and have low cardinality, our method achieves high accuracy empirically at a much reduced computational cost. The strengths and weaknesses of our method and its alternatives are examined through experiments, and we recommend scenarios for their respective usage. Moreover, we develop both serial and parallelized versions of the algorithm. By experimenting with large-scale data, we demonstrate the computational efficiency of the new methods and investigate their convergence properties and numerical stability. The clustering results obtained on several datasets in different domains are highly competitive in comparison with some widely used methods in the corresponding areas.

show abstract

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques

Haidar

Abdelfattah

Zounon

et al. 2018

View full text Add to dashboard Cite

As parallel computers approach the exascale, power efficiency in Highperformance computing (HPC) systems is of increasing concern. Exploiting both, the hardware features, and algorithms is an effective solution to achieve power efficiency, and address the energy constraints in modern and future HPC systems. In this work, we present a novel design and implementation of an energy efficient solution for dense linear system of equations, which are at the heart of largescale HPC applications. The proposed energy efficient linear system solvers are based on two main components: (1) iterative refinement techniques, and (2) reduced precision computing features in the modern accelerators and co-processors. While most of the energy efficiency approaches aim to reduce the consumption with a minimal performance penalty, our method improves both, the performance and the energy-efficiency. Compared to highly optimised linear system solvers, our kernels are up to 2× faster to deliver the same accuracy solution, and reduce the energy consumption up to half on Intel KNL architectures. By using efficiently the tensor cores available in the NVIDIA V100 PCIe GPUs, the speedups can be up to 4× with more than 80% reduction on the energy consumption.

show abstract

Investigating half precision arithmetic to accelerate dense linear system solvers

Haidar

Tomov

et al. 2017

View full text Add to dashboard Cite

The use of low-precision arithmetic in mixed-precision computing methods has been a powerful tool to accelerate numerous scientific computing applications. Artificial intelligence (AI) in particular has pushed this to current extremes, making use of half-precision floating-point arithmetic (FP16) in approaches based on neural networks. The appeal of FP16 is in the high performance that can be achieved using it on today's powerful manycore GPU accelerators, e.g., like the NVIDIA V100, that can provide 120 TeraFLOPS alone in FP16. We present an investigation showing that other HPC applications can harness this power too, and in particular, the general HPC problem of solving Ax = b, where A is a large dense matrix, and the solution is needed in FP32 or FP64 accuracy. Our approach is based on the mixed-precision iterative refinement technique-we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly-tuned implementations that resolve the main computational challenges of efficiently parallelizing, scaling, and using FP16 arithmetic in the approach on high-end GPUs. Subsequently, we show for a first time how the use of FP16 arithmetic can significantly accelerate, as well as make more energy efficient, FP32 or FP64-precision Ax = b solvers. Our results are reproducible and the developments will be made available through the MAGMA library. We quantify in practice the performance, and limitations of the approach.

show abstract

Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory

Liu

Vetter

et al. 2016

View full text Add to dashboard Cite

The emergence of many non-volatile memory (NVM) techniques is poised to revolutionize main memory systems because of the relatively high capacity and low lifetime power consumption of NVM. However, to avoid the typical limitation of NVM as the main memory, NVM is usually combined with DRAM to form a hybrid NVM/DRAM system to gain the benefits of each. However, this integrated memory system raises a question on how to manage data placement and movement across NVM and DRAM, which is critical for maximizing the benefits of this integration. The existing solutions have several limitations, which obstruct adoption of these solutions in the high performance computing (HPC) domain. In particular, they cannot take advantage of application semantics, thus losing critical optimization opportunities and demanding extensive hardware extensions; they implement persistent semantics for resilience purpose while suffering large performance and energy overhead. In this paper, we reexamine the current hybrid memory designs from the HPC perspective, and aim to leverage the knowledge of numerical algorithms to direct data placement. With explicit algorithm management and limited hardware support, we optimize data movement between NVM and DRAM, improve data locality, and implement a relaxed memory persistency scheme in NVM. Our work demonstrates significant benefits of integrating algorithm knowledge into the hybrid memory design to achieve multi-dimensional optimization (performance, energy, and resilience) in HPC.

show abstract

Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing

Tan

Song

et al. 2015

View full text Add to dashboard Cite

Energy efficiency and resilience are two crucial challenges for High Performance Computing (HPC) systems to reach exascale. While energy efficiency and resilience issues have been extensively studied individually, little has been done to understand the interplay between energy efficiency and resilience for HPC systems. Decreasing the supply voltage associated with a given operating frequency for processors and other CMOS-based components can significantly reduce power consumption. However, this often raises system failure rates and consequently increases application execution time. In this work, we present an energy saving undervolting approach that leverages the mainstream resilience techniques to tolerate the increased failures caused by undervolting. Our strategy is directed by analytic models, which capture the impacts of undervolting and the interplay between energy efficiency and resilience. Experimental results on a power-aware cluster demonstrate that our approach can save up to 12.1% energy compared to the baseline, and conserve up to 9.1% more energy than a state-of-the-art DVFS solution.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Panruo Wu

Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques

Investigating half precision arithmetic to accelerate dense linear system solvers

Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory

Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing

Contact Info

Product

Resources

About