GPGPU programming promises high performance. However, to achieve it, developers must overcome several challenges. The main ones are : write and use hyper-parallel kernels on GPU, manage memory transfers between CPU and GPU, and compose kernels, keeping individual performance of components while optimizing the global performance. In this article, we propose to study the composition by distinguishing the location where it is done : kernel composition on the GPU, kernel generation by the CPU, and overall composition. To achieve it, we use the SPOC library, developed in OCaml. SPOC offers abstractions over the Cuda and OpenCL frameworks. It proposes a specific language, called Sarek, to express kernels and different parallel skeletons to compose them. We show that by increasing the level of abstraction to handle kernels, programs are easier to write and that some optimizations (via kernel generation and transfers scheduling) become possible. Thus, we win on both sides : expressiveness and efficiency.