A typical embedded application can be considered as a mixture of computation intensive part and control intensive part. Existing coarse-grained reconfigurable array architecture shows high performance for the computation intensive part, but cannot handle the control intensive part efficiently, thereby degrading the overall performance. This paper presents an approach to cope with such limitation by using kernel-level speculative execution. The simulation result shows that our approach increases the average performance of the deblocking filter for a luma macroblock and a chroma macroblock over 18 and 42 times respectively compared to conventional software implementation.