We develop approximation algorithms for the problem of placing replicated data in arbitrary networks, where the nodes may both issue requests for data objects and have capacity for storing data objects so as to minimize the average data-access cost. We introduce the data placement problem to model this problem. We have a set of caches F , a set of clients D, and a set of data objects O. Each cache i can store at most u i data objects. Each client j ∈ D has demand d j for a specific data object o(j) ∈ O and has to be assigned to a cache that stores that object. Storing an object o in cache i incurs a storage cost of f o i , and assigning client j to cache i incurs an access cost of d j c ij . The goal is to find a placement of the data objects to caches respecting the capacity constraints, and an assignment of clients to caches so as to minimize the total storage and client access costs. We present a 10-approximation algorithm for this problem. Our algorithm is based on rounding an optimal solution to a natural linear-programming relaxation of the problem. One of the main technical challenges encountered during rounding is to preserve the cache capacities while incurring only a constant-factor increase in the solution cost. We also introduce the connected data placement problem to capture settings where write-requests are also issued for data objects, so that one requires a mechanism to maintain consistency of data. We model this by requiring that all caches containing a given object be connected by a Steiner tree to a root for that object, which issues a multicast message upon a write to (any copy of) that object. The total cost now includes the cost of these Steiner trees. We devise a 14-approximation algorithm for this problem. We show that our algorithms can be adapted to handle two variants of the problem: (a) a k-median variant, where there is a specified bound on the number of caches that may contain a given object, and (b) a generalization where objects have lengths and the total length of the objects stored in any cache must not exceed its capacity.
ÐSince instruction level parallelism in basic blocks is often limited, compilers increase performance by creating superblocks that allow operations to be issued speculatively. This is difficult in general because each branch competes for the processor's limited resources. Previous work manages the performance trade-offs that exist between branches only indirectly. We show here that dependence and resource constraints can be used to gather explicit knowledge about scheduling trade-offs between branches. This paper's first contribution is a set of new, tighter lower bounds on the execution times of superblocks that specifically account for the dependence and resource conflicts between pairs of branches. This paper's second contribution is a novel superblock scheduling heuristic that finds high performance schedules by determining the operations that each branch needs to be scheduled early and selecting branches with compatible needs that favor beneficial branch trade-offs. Performance evaluations for superblocks from SPECint95 indicate that our bounds are very tight and that our scheduling heuristic outperforms well-known superblock scheduling algorithms. Index TermsÐSuperblock, scheduling heuristic, lower bound, ILP compiler technique.
Modern compiler transformations that eliminate redundant computations or reorder instructions, such as partial redundancy elimination and instruction scheduling, are very effective in improving application performance but tend to create longer and potentially more complex live ranges. Typically the task of dealing with the increased register pressure is left to the register allocator. To avoid introduction of spill code which can reduce or completely eliminate the benefit of earlier optimizations, researchers have developed techniques such as live range splitting and rematerialization. This paper describes prematerialization (PM), a novel method for reducing register pressure for VLIW architectures with nop instructions. PM and rematerialization both select "never killed" live ranges and break them up by introducing one or more definitions close to the uses. However, while rematerialization is applied to live ranges selected for spilling during register allocation, PM relies on the availability of nop instructions and occurs prior to register allocation. PM simplifies register allocation by creating live ranges that are easier to color and less likely to spill. We have implemented prematerialization in HP-UX production compilers for the Intel ® Itanium ® architecture. Performance evaluation indicates that the proposed technique is effective in reducing register pressure inherent in highly optimized code.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.