Existing pipelines of semantic correspondence commonly include extracting highlevel semantic features for the invariance against intra-class variations and background clutters. This architecture, however, inevitably results in a low-resolution matching field that additionally requires an ad-hoc interpolation process as a postprocessing for converting it into a high-resolution one, certainly limiting the overall performance of matching results. To overcome this, inspired by recent success of implicit neural representation, we present a novel method for semantic correspondence, called Neural Matching Field (NeMF). However, complicacy and highdimensionality of a 4D matching field are the major hindrances, which we propose a cost embedding network to process a coarse cost volume to use as a guidance for establishing high-precision matching field through the following fully-connected network. Nevertheless, learning a high-dimensional matching field remains challenging mainly due to computational complexity, since a naïve exhaustive inference would require querying from all pixels in the 4D space to infer pixel-wise correspondences. To overcome this, we propose adequate training and inference procedures, which in the training phase, we randomly sample matching candidates and in the inference phase, we iteratively performs PatchMatch-based inference and coordinate optimization at test time. With these combined, competitive results are attained on several standard benchmarks for semantic correspondence. Code and pre-trained weights are available at https://ku-cvlab.github.io/NeMF/.
IntroductionEstablishing visual correspondence across semantically similar images is a fundamental problem in computer vision, which has been facilitating many applications including visual localization [69,38], structure-from-motion [70], image editing [1] and autonomous driving [33]. Unlike traditional dense correspondence tasks [20,23], where visually similar images of the same scene are used as inputs, semantic correspondence problem poses additional challenges due to intra-class appearance and severe geometry variations among object instances [15,16].