We propose a new software framework, named "decoupling architecture", for all-to-all computation. In this framework, the user's kernel code and well-tuned parallel control code are clearly separated to allows users to move smoothly from running sequential programs on a single-node server to fully utilizing a powerful machine, such as the K computer. In an evaluation of our prototype, the proposed decoupling architecture introduced some overhead. However, we show that the overhead can be reduced.