Violência doméstica contra idosos assistidos na atenção básica

Abstract-The increasing numbers of cores, shared caches and memory nodes within machines introduces a complex hardware topology. High-performance computing applications now have to carefully adapt their placement and behavior according to the underlying hierarchy of hardware resources and their software affinities.We introduce the Hardware Locality (hwloc) software which gathers hardware information about processors, caches, memory nodes and more, and exposes it to applications and runtime systems in a abstracted and portable hierarchical manner. hwloc may significantly help performance by having runtime systems place their tasks or adapt their communication strategies depending on hardware affinities.We show that hwloc can already be used by popular highperformance OPENMP or MPI software. Indeed, scheduling OPENMP threads according to their affinities or placing MPI processes according to their communication patterns shows interesting performance improvement thanks to hwloc. An optimized MPI communication strategy may also be dynamically chosen according to the location of the communicating processes in the machine and its hardware characteristics.

show abstract

Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

Agullo¹,

Aumage²,

Faverge³

et al. 2024

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it enables very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non-sequential) task-based programming paradigm.

show abstract

ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures

Broquedis

Furmento

Goglin

et al. 2010

Int J Parallel Prog

View full text Add to dashboard Cite

StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators

Augonnet

Aumage²,

Furmento

et al. 2012

View full text Add to dashboard Cite

Enabling high-performance memory migration for multithreaded applications on LINUX

Goglin

Furmento

2009

View full text Add to dashboard Cite

As the number of cores per machine increases, memory architectures are being redesigned to avoid bus contention and sustain higher throughput needs. The emergence of Non-Uniform Memory Access (NUMA) constraints has caused affinities between threads and buffers to become an important decision criteria for schedulers. Memory migration enables the dynamically joined distribution of work and data across the machine but requires high-performance data transfers as well as a convenient programming interface. We present the improvement of the LINUX migration primitives and the implementation of a Next-touch policy in the kernel to provide multithreaded applications with an easy way to dynamically maintain thread-data affinity. Microbenchmarks show that our work enables a high-performance, synchronous and lazy memory migration within multithreaded applications. A threaded LU factorization then reveals the large improvement that our Next-touch policy model may bring in applications with complex access patterns.

show abstract

Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite

Virouleau

Brunet

Broquedis

et al. 2014

View full text Add to dashboard Cite

International audienceThe recent introduction of task dependencies in the OpenMP specifi-cation provides new ways of synchronizing tasks. Application programmers can now describe the data a task will read as input and write as output, letting the runtime system resolve fine-grain dependencies between tasks to decide which task should execute next. Such an approach should scale better than the excessive global synchronization found in most OpenMP 3.0 applications. As promising as it looks however, any new feature needs proper evaluation to encourage applica-tion programmers to embrace it. This paper introduces the KASTORS benchmark suite designed to evaluate OpenMP tasks dependencies. We modified state-of-the-art OpenMP 3.0 benchmarks and data-flow parallel linear algebra kernels to make use of tasks dependencies. Learning from this experience, we propose extensions to the current OpenMP specification to improve the expressiveness of dependen-cies. We eventually evaluate both the GCC/libGOMP and the CLANG/libIOMP implementations of OpenMP 4.0 on our KASTORS suite, demonstrating the in-terest of task dependencies compared to taskwait-based approaches

show abstract

Making the Grid Predictable through Reservations and Performance Modelling

McGough

Afzal

Darlington

et al. 2005

The Computer Journal

View full text Add to dashboard Cite

ICENI: Optimisation of component applications within a Grid environment

et al. 2002

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nathalie Furmento

hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications

Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures

StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators

Enabling high-performance memory migration for multithreaded applications on LINUX

Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite

Making the Grid Predictable through Reservations and Performance Modelling

ICENI: Optimisation of component applications within a Grid environment

Contact Info

Product

Resources

About