The embedded DSP blocks in modern Field Programmable Gate Arrays (FPGAs) are highly capable and support a variety of different d atap. at h configuratio ns. These evolved to support a ra nge of applications requiring significant amo unts of fast :=trithmeti c. In :=tcldition to :=tll t he comput:=tti on:=tl c:=tp:=tbilities , DSP blocks support run t ime dy namic progra mmability, which allows a single DSP block to be used as a different co mputational block in every clock cycle. Vendor synthesis tools can infer t he use of these resources but they do not exploit their full capa bilities, especially the dynamic configura tion. Specific language structures arc suggested for implementing standard a pplications but others t hat do not fit t hese standard designs can suffer from inefficient m apping. High-level sy nthesis (HLS) tools rely on the backend synthe •is tools to map effi ciently to t he target architecture. This thesis explores how DSP blocks can be exploited to produce high t hroughput computationa l kernels at close the t heoretical limit of the primitives, and how t heir dyn amic programma bility can be exploited to create effi cient implementat ions. We show t hat t his can be achieved using a high level descrip t ion, but only by considering architectural information at higher levels. An a utomated tool fl ow is presented that takes a hi gh-level descrip tion of a computat ional kernel in C a nd generates synthcsis:=thlc Vcri log t h:=tt :=tchieves perform:=tncc close to theoretical limits of the DSP block with hand-optimised designs. We extend t his tool to support proposed techniques for resource sha ring of DSP blocks, ad a pting traditional a pproaches for t he hi gh latency of t he DSP blocks, a nd also a pplying multi-pumping in th is new context. This detailed design results in circuits that always operate at close to the theoretical limi ts, and offer full utilisat ion of t he DSP block.