Traditionally, hardware designs partitioned across multiple FPGAs have had low performance due to the inefficiency of maintaining cycle-by-cycle timing among discrete FPGAs. In this paper, we present a mechanism by which complex designs may be efficiently and automatically partitioned among multiple FPGAs using explicitly programmed latency-insensitive links. We describe the automatic synthesis of an area efficient, high performance network for routing these inter-FPGA links. By mapping a diverse set of large research prototypes onto a multiple FPGA platform, we demonstrate that our tool obtains significant gains in design feasibility, compilation time, and even wall-clock performance.