Abstract-Performance problems in parallel programs manifest as lack of scalability. These scalability issues are often very difficult to debug. They can stem from synchronization overhead, poor thread scheduling decisions, or contention for hardware resources, such as shared caches. Traditional profiling tools attribute program cycles to different functions, but do not generate immediate insight into issues limiting scalability. Profiling information is very program-specific and is usually processed manually by a human expert in a time-consuming and cumbersome process.Our experience in tuning performance of parallel applications led us to discover that performance tuning can be considerably simplified, and even to some degree automated, if profiling measurements are organized according to several intuitive performance factors common to most parallel programs. In this work we present these factors and propose a hierarchical framework composing them. We present three case studies where analyzing profiling data according to the proposed principle led us to improve performance of three parallel programs by a factor of 6-20×. Our work lays foundation for new ways of organizing and visualizing profiling data in performance tuning tools.
We consider the connected variant of the classic mixed search game where, in each search step, cleaned edges form a connected subgraph. We consider graph classes with bounded connected (and monotone) mixed search number and we deal with the question whether the obstruction set, with respect of the contraction partial ordering, for those classes is finite. In general, there is no guarantee that those sets are finite, as graphs are not well quasi ordered under the contraction partial ordering relation. In this paper we provide the obstruction set for k = 2, where k is the number of searchers we are allowed to use. This set is finite, it consists of 177 graphs and completely characterises the graphs with connected (and monotone) mixed search number at most 2. Our proof reveals that the "sense of direction" of an optimal search searching is important for connected search which is in contrast to the unconnected original case. We also give a double exponential lower bound on the size of the obstruction set for the classes where this set is finite.
No abstract
Shared state access conflicts are one of the greatest sources of error for fine grained parallelism in any domain. Notoriously hard to debug, these conflicts reduce reliability and increase development time. The standard task graph model dictates that tasks with potential conflicting accesses to shared state must be linked by a dependency, even if there is no explicit logical ordering on their execution. In cases where it is difficult to understand if such implicit dependencies exist, the programmer often creates more dependencies than needed, which results in constrained graphs with large monolithic tasks and limited parallelism. We propose a new technique, Synchronization via Scheduling (SvS), that uses the results of static and dynamic code analysis to manage potential shared state conflicts by exposing the data accesses of each task to the scheduler. We present an in-depth performance analysis of SvS via examples from video games, our target domain, and show that SvS performs well in comparison to software transactional memory (TM) and fine grained mutexes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.