In this paper we present the ESPRESO FEM library, which includes a FEM toolbox with interfaces to professional and open-source simulation tools, and a massively parallel Hybrid Total FETI (HTFETI) solver which can fully utilize the OLCF Titan supercomputer, and achieves super-linear scaling. This paper presents several new techniques for FETI solvers designed for efficient utilization of supercomputers with a focus on: (i) performance-we present a fivefold reduction of solver runtime for the Laplace equation by redesigning the FETI solver, and offloading the key workload to the accelerator. We compare Intel Xeon Phi 7120p and Tesla K80 and P100 accelerators to Intel Xeon E5-2680v3 and Xeon Phi 7210 CPUs; and (ii) memory efficiency-we present two techniques which increase the efficiency of the HTFETI solver 1.8 times, and pushes the limits of the largest possible problem ESPRESO can solve from 124 to 223 billion unknowns for problems with unstructured meshes. Finally we show that by dynamicly tuning hardware parameters we can reduce energy consumption by up to 33 %.