This paper presents a new technique for the minimization of I/O delay in the architectural synthesis of cyclic data flow graphs (DFG) representing DSP algorithms taking into consideration the inter-processor communication delays. In this paper, the question of optimizing the I/O delay without scarifying the iteration period (throughput) with non-negligible inter-processor communication overhead is addressed. The proposed technique operating on the cyclic DFG of a DSP algorithm is designed to evaluate the relative firing times of the nodes by using Floyd-Warshall's longest path algorithm so that the inter-processor communication overhead is taken into consideration to provide an optimized time and processor schedule. Moreover, the proposed scheme is applied to wellknow DSP benchmarks and seen that it is efficient in minimizing the I/O delay without scarifying the iteration period.
I.INTRODUCTION Some of the applications that need a high-level synthesis are digital signal processing (DSP), communications, and image processing. These applications are among the most important applications that demand high computational power, and must be executed at a very high speed to enable real-time processing. Due to the parallelism within the DSP applications, parallel processing architectures are a natural choice for the synthesis of these applications. The problem of architectural synthesis for iterative algorithms has received a great deal of attention in recent years. However, most of the techniques for multiprocessor scheduling deal with a simplified problem in which the time to communicate data from one processor to another (interconnect delay) is not taken into consideration [1], thus eventually leading to unrealistic schedules. Well-known examples of such techniques are a cyclo-static scheduling method that uses exhaustive search [2] and the optimum unfolding technique [3]. These methods have one thing in common in that they have not considered the inter-processor communication delays (ICD). Curtis and Madisetti [4] have shown that inclusion of the ICDs is essential in a realistic development of multiprocessor schedules. Their objective was to use realistic structural and behavioral level description of a DSP algorithm that takes into consideration the ICDs to find simultaneously rate optimal and processor optimal schedules. They developed the so called DSMP-C1 method for this purpose. However, this method is computationally intensive, practical only for small DSP algorithms, and suitable for homogeneous multiprocessor systems. The techniques proposed in [5 ,6 ] use integer linear programming (ILP) to consider the inter-processor communication delay during the scheduling of the DSP applications mapped onto pre-defined homogeneous multiprocessor systems with different topologies