The divisible load scheduling of image processing applications on the heterogeneous star and multi-level tree networks is addressed in this paper. In our platforms, processors and network links have different speeds. In addition, computation and communication overheads are considered. A new genetic algorithm for minimizing the processing time of low-level image applications using divisible load theory is introduced. The closed-form solution for the processing time, the image fractions that should be allocated to each processor, the optimum number of participating processors, and the optimal sequence for load distribution are derived. The new concept of equivalent processor in tree network is introduced and the effect of different image and kernel sizes on processing time and speed up are investigated. Finally, to indicate the efficiency of our algorithm, several numerical experiments are presented. KEYWORDS divisible load scheduling, equivalent processor, genetic algorithm, image fractions, load distribution sequence, local operation
INTRODUCTIONDivisible load scheduling is a special class of data parallelization methods that can be used in those applications that are divided into any number of independent fractions. These fractions can be processed in parallel on different processors. Big dataset processing, matrix computation, signal and image processing, Hough transform, and experimental data processing are examples of these applications. Divisible Load Scheduling (DLS) has been studied extensively in the last two decades because of its simplicity and analytical tractability. 1 Image processing applications need extensive computational power that cannot be provided by one processor. 2 DLS is a good option for exploiting data parallelism in image applications 3,4 because most image processing applications are divisible in nature. There are three kinds of operations in image processing, ie, pixel operations in which each pixel can be processed independently; local operations, which are most common operators in image application, and the value of each output pixel is a function of the value of that pixel plus some neighboring pixels; and global operators that need the information of the whole image to process the value of one pixel. Pixel and local operations are good candidates for data parallelism. In this paper, we focus on local operators, as the most common operators in image applications.We consider the star and tree networks as the target platforms. In these networks, the master processor holds the entire image and the kernel.It does not participate in image processing and just partitions the workload and distributes it to other slave processors. In a star topology, slave processors begin to compute their image fractions after receiving their workload from the master processor completely. Parent nodes in tree topology have the capability of computing and communicating at the same time. They are equipped with front-end processors in our model.The objective of using DLS for data parallelism in image applicat...