Mesh optimization is essential to enable sufficient element quality for numerical methods such as the finite element method (FEM). Depending on the required accuracy and geometric detail, a mesh with many elements is necessary to resolve small-scale details. Sequential optimization of large meshes often imposes long run times. This is especially an issue for Delaunay-based methods. Recently, the notion of harmonic triangulations [1] was evaluated for tetrahedral meshes, revealing significantly faster run times than competing Delaunay-based methods. A crucial aspect for efficiency and high element quality is boundary treatment. We investigate directional derivatives for boundary treatment and massively parallel GPUs for mesh optimization. Parallel flipping achieves compelling speedups by up to $$318\times $$
318
×
. We accelerate harmonic mesh optimization by $$119\times $$
119
×
for boundary preservation and $$78\times $$
78
×
for moving every boundary vertex, while producing superior mesh quality.