“…There are many attempts to drive a static portrait with a video or audio from different perspectives in recent literature. A set of methods [15,38,65,72] take the advantage of 3D Morphable Models (3DMMs), a parametric model that decomposes expression, pose, and identity, to transfer facial motions. For the audio-driven case, the audio features are always projected to the parameter space of 3DMM [65,67,70].…”