Determination of the processor functionality in the design of processor arrays

Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium

Merker

1999

Self Cite

In this paper, the implementation of computationally intensive algorithms on a parallel system of digital signal processors (DSPs) is examined. The used parallelization technique is based on methods of the design of parallel processor arrays. The parameters of the DSP system, such as computation and communication time, size and access time of the memory, are included into the parallelization methods. The efficiency of the parallel solutions is demonstrated by means of implementations of algorithms for solving systems of linear equations.

Section: Parallelization Of Algorithmsmentioning

confidence: 99%

“…Aiming at a VLSIrealization of processor arrays, system requirements and hardware restrictions have to be integrated in the design flow. For this purpose, several tools and optimization methods have been developed [6,8]. In the following, a short overview of the general design steps depicted in figure 2 is presented.…”

Section: Modeling Of the Dsp Systemmentioning

confidence: 99%

Parallelization of algorithms for a system of digital signal processors

Kortke

Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium

Merker

1999

Self Cite

“…Because of the lack of space we refer to [2,3] for a treatment of uniform affine allocation functions 7ri :…”

Section: Design Of Processor Arraysmentioning

confidence: 99%

“…An approach to minimize the throughput by consideration of the chip area is proposed in [9]. In [2] the approach [13] is extended to determine additionally the processor functionality in order to minimize a chip arealatency product. The paper is organized as follows.…”

Section: Introductionmentioning

confidence: 99%

Design of processor arrays for real-time applications

Euro-Par’98 Parallel Processing

Merker

1998

This paper covers the design of processor arrays for algorithms with uniform dependencies. The design constraint is a limited latency of the resulting processor array. As objective of the design the minimization of the costs for an implementation of the processor array in silicon is considered. Our approach starts with the determination of a set of proper linear allocation functions with respect to the number of processors. It follows the computation of a uniform anne scheduling function. Thereby, a module selection and the size of partitions of a following partitioning is determined. A proposed linearization of the arising optimization problems permits the application of integer linear programming.

“…Our approach is based on several work covering resource constraint scheduling [2,3,4,6,9,11,12]. The proposed methods consider either a full-size array or a partitioned processor array.…”

mentioning

confidence: 99%

Generation of scheduling functions supporting LSGP-partitioning

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors

Self Cite

In this paper, we present an approach to determine scheduling functions suitable for the design of processor arrays. The considered scheduling functions support a followed LSGP-partitioning of the processor array by allowing to execute the tasks of processors of the full-size array mapped into one processor of the partitioned processor array in an arbitrary order. Several constraints are derived to ensure the causality of computations and to prevent access conflicts to both modules and registers. We propose an optimization problem generating the scheduling functions and outline its implementation as an integer linear program. The proposed methods are also applicable for the mapping of algorithms to parallel architectures. In this case, the scheduling function produces identical, independent small threads which can be combined to utilize the target architecture as much as possible. ½º ÁÒØÖÓ Ù Ø ÓÒThis paper contributes to the design of processor arrays for regular algorithms. We derive an optimization problem for generating a scheduling function leading to the minimum latency of the processor array. Thereby both a limited number of modules implementing operations in the processors and a limited number of available registers in the processors can be incorporated. The main feature of our approach is the support of a partitioning of the resulting processor array by keeping various degrees of freedom to define the partitions. Thereby, we assume that the sequence of operations in each processor of the full-size array is divided into tasks of a length equal to one iteration interval. Then, we derive constraints allowing to evaluate the parallel tasks of processors of the full-size array in one processor of the partitioned processor array in an arbitrary order without causality conflicts. In the course of the paper, we consider two scheduling functions. The first scheduling function, called uniform affine scheduling function, is well-studied in approaches concerning resource constraint scheduling. The second function we have called quasi uniform scheduling function since a floor operator is used to ensure besides a periodic schedule in each processor, that each processor performs at each time the same operation. This allows to exchange operations between processors in an arbitrary manner. An additional constraint ensures for both scheduling functions that entire iterations can be exchanged between processors, and that parallel iterations can be evaluated sequentially in one processor. Thus, a straightforward partitioning of the processor array is possible. Furthermore, the approach can be used to ensure an optimal utilization of a multiprocessor system by consideration of each iteration as a thread which can be arbitrarily distributed among the available processors.