Runtime reconfiguration is a promising solution for reducing hardware cost in embedded systems, without compromising on performance. We present a framework that aims to increase the advantages of runtime reconfiguration on reconfigurable processors that support full or partial runtime reconfiguration. The proposed framework incorporates a hierarchical loop partitioning strategy that leverages FPGAaware merging of custom instructions to: 1) maximize the reconfigurable logic block utilization in each configuration, and 2) reduce the runtime reconfiguration overhead. Experimental results show that the proposed strategy leads to over 39% average reduction in runtime reconfiguration overhead for partial runtime reconfiguration. In addition, the proposed strategy leads to an average performance gain of over 32% and 34% for full and partial runtime reconfiguration respectively.