Deformable image registration is still a challenge when the considered images have strong variations in appearance and large initial misalignment. A huge performance gap currently remains for fast-moving regions in videos or strong deformations of natural objects. We present a new semantically guided and two-step deep deformation network that is particularly well suited for the estimation of large deformations. We combine a U-Net architecture that is weakly supervised with segmentation information to extract semantically meaningful features with multiple stages of nonrigid spatial transformer networks parameterized with low-dimensional B-spline deformations. Combining alignment loss and semantic loss functions together with a regularization penalty to obtain smooth and plausible deformations, we achieve superior results in terms of alignment quality compared to previous approaches that have only considered a label-driven alignment loss. Our network model advances the state of the art for inter-subject face part alignment and motion tracking in medical cardiac magnetic resonance imaging (MRI) sequences in comparison to the FlowNet and Label-Reg, two recent deep-learning registration frameworks. The models are compact, very fast in inference, and demonstrate clear potential for a variety of challenging tracking and/or alignment tasks in computer vision and medical image analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.