No abstract
In this paper we propose the use of a PCI-based programmable protocol controller for hiding communication and coherence overheads in software DSMs. Our protocol controller provides three different types of overhead tolerance: a) moving basic communication and coherence tasks away from computation processors; b) prefetching of diffs; and c) generating and applying diffs with hardware assistance. We evaluate the isolated and combined impact of these features on the performance of TreadMarks. We also compare performance against two versions of the Shrimp-based AURC protocol. Using detailed execution-driven simulations of a 16-node network of workstations, we show that the greatest performance benefits provided by our protocol controller come from our hardware-supported diffs. Reducing the burden of communication and coherence transactions on the computation processor is also beneficial but to a smaller extent. Prefetching is not always profitable. Our results show that our protocol controller can improve running time performance by up to 50% for TreadMarks, which means that it can double the TreadMarks speedups. The overlapping implementation of TreadMarks performs as well or better than AURC for 5 of our 6 applications. We conclude that the simple hardware support we propose allows for the implementation of high-performance software DSMs at low cost. Based on this conclusion, we are building the NCP 2 parallel system at COPPE/UFRJ.
In this paper we propose and evaluate the Adaptive++ technique, a novel runtime-only data prefetching strategy for software-based distributed shared-memory systems (software DSMs). Adaptive++ improves the performance of regular parallel applications running on software DSMs by using the past history of memory access faults to adapt between repeated-phase and repeated-stride prefetching modes. Adaptive++ does not issue prefetches during periods when the application is not exhibiting one of these two types of behavior and is thus behaving irregularly. Through detailed execution-driven simulations of several applications, we show that our prefetching technique is very successful at reducing the data access overheads of regular applications running on the TreadMarks software DSM. Adaptive++ also reduces the overhead of applications that are not strictly regular but that exhibit periods of regularity. In terms of overall performance, our results show that Adaptive++ can provide speedup improvements as significant as 34% on 16 processors. A direct comparison against two runtime-only prefetching techniques proposed thus far shows that Adaptive++ is consistently competitive in terms of performance, while being able to optimize a larger set of applications. Our main conclusion is that Adaptive++ should definitely be considered by software DSM designers as an effective way of tolerating the overhead of remote data accesses.
This paper presents a workflow scheduling model, which is based on a hybrid bi-criteria scheduling algorithm based on a class of service approach. The proposed model is based on the analysis of several related works on resource scheduling, trying to incorporate the most relevant aspects considered in these works, thus covering some shortcomings pointed out by them. The proposed model aims to optimize the criteria chosen by the users, based on their specified priority order and a variation limit specified by them. In order to validate this model, a set of tests were performed comparing the performance of the proposed model with the Join the Shortest Queue scheduling algorithm. The analysis of the obtained results showed an improvement in the overall quality and also a performance gain made available to the users -as they meet the priority criteria.
This paper proposes and evaluates Home-based Adaptive Protocol (HAP), a software distributed shared-memory system. HAP performs two key functions that distinguish it from most other distributed shared-memory systems: detection of sharing patterns and behavior adaptation based on these patterns. Detection consists of identifying any change in the sharing pattern of a shared page. Adaptation consists of using a strategy that is specific to the sharing pattern detected to optimize the performance of the system. More specifically, HAP uses updates to maintain the coherence of single-writer pages, which fall under the migratory and producer-consumer sharing patterns. Invalidations are used to maintain the coherence of multiple-writer pages, which can potentially be falsely shared. As part of HAP's adaptation strategy, we dynamically assign homes to pages based on their sharing patterns. We performed preliminary experiments on an 8-node cluster of PCs. Our results show that the current implementation of HAP substantially improves the performance of single-writer applications in which shared pages are modified in critical sections protected by locks. The results also indicate potential improvement in the performance of applications exhibiting other sharing patterns such as producer-consumer, single-writer between barriers. However, the detection and adaptation techniques for these patterns have to be redesigned to exploit the real performance gains that can be achieved with the adaptive system.
Scientific Visualization is a computer-based field concerned with techniques that allow scientists to create graphical representations from datasets generated by computational simulations or acquisition instruments. To address the computational cost of visualization tasks, specially for large datasets, researchers have explored grid environments as a platform for their parallel evaluation. It is however not trivial to adapt each different visualization technique to run in grid environments. A desirable alternative would separate the specificities of data and process distribution in grids from visualization computation logic. In this work we claim that the QEF (query evaluation framework) leverages scientific visualization computation with the above mentioned characteristics. Visualization computation techniques are modeled as operators in an algebra and integrated with a set of control operators that manage data distribution leading to a parallel QEP (query execution plan). We show the benefits of parallelization for two of those techniques: particle tracing and volume rendering. For these techniques, our experiments demonstrate many positive aspects of the solution presented, as well as opportunities for future work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.