The constant push for feature richness in mobile and embedded devices has significantly increased computational demand. However, stringent energy constraints typically remain in place. Embedding processor cores in FPGAs offers a path to having customized instruction processors that can meet the performance and energy demands. Ideally, the customization process should be automated to reduce the design effort, and indirectly the time to market. However, the automatic generation of custom extensions for floating point computation remains a challenge in FPGA codesign. We propose an approach for accelerating such computation via application-specific SIMD extensions. We describe an automated co-design toolchain that generates code and application-specific platform extensions that implement SIMD instructions with a parameterizable number of vector elements. The parallelism exposed by encapsulating computation in vector instructions is matched to an adjustable pool of execution units. Experiments on actual hardware show significant performance improvements. Our framework provides an important extension to the capabilities of embedded processor FPGAs which traditionally dealt with bit, integer, and low intensity floating point code, to now being able to handle vectorizable floating point computation.