We investigate the impact of level-1 cache (CL1) parameters, level-2 cache (CL2) parameters, and cache organizations on the power consumption and performance of multi-core systems. We simulate two 4-core architectures -both with private CL1s, but one with shared CL2 and the other one with private CL2s. Simulation results with MPEG4, H.264, matrix inversion, and DFT workloads show that reductions in total power consumption and mean delay per task of up to 42% and 48%, respectively, are possible with optimized CL1s and CL2s. Total power consumption and the mean delay per task depend significantly on the applications including the code size and locality.Index Terms-Cache memory organization, homogeneous systems, low-power design, multi-core processor