Multi-core architectures can deliver high processing power if the multiple levels of parallelism they expose are exploited. However, it is non-trivial to orchestrate the computational and memory resources allocation. Furthermore, when dealing with distributed memory architectures, data distribution adds another level of complexity. This paper presents a modeldriven technique for mapping multi-task parallel programs onto multi-core platforms. The resource allocation configurations are expressed within a three-dimensional optimization space which affects task-level and data-level parallelism, and communication. Then, the performance of any valid parallelization scheme is estimated statically, taking into account both computation and communication costs. We prototyped our approach on the Cell BE with an image processing application. Experiments show that our model can correctly predict the overall performance and highlight the most efficient parallelization schemes.