CATs++: Boosting Cost Aggregation with Convolutions and Transformers

Cho, Seokju; Hong, Sunghwan; Kim, Seungryong

doi:10.48550/arxiv.2202.06817

Cited by 5 publications

(25 citation statements)

References 72 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…7. We note the resolution which the method is evaluated, since [9,78] observe that the resolution of images affect the PCK performance, and the resolution of which the method outputs the correspondence field. It is shown that NeMF achieves competitive performance or even attains state-of-the-art performance for several alpha thresholds.…”

Section: Matching Resultsmentioning

confidence: 99%

“…CHM [47] extends the PHM by employing high-dimensional convolutional kernels to aggregate 6D correlation maps. CATs [8] and its extension [9] use transformers [79,12] to explore global consensus from correlation maps thanks to transformers' ability to consider long-range interactions. All these works exploit rich semantics present at high-level features for robust matching across semantically similar images.…”

Section: Related Workmentioning

confidence: 99%

“…interpolation [66,65,26,8,78,9] or TPS warping with sparse keypoints [48,50,47], significantly reducing localization precision in matching details. Instead of these hand-crafted designs, several works [25,77,19] attempted to formulate a coarse-to-fine approach by utilizing multi-level features, but they often suffer from the propagation of initial error from the early coarse level.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence

Hong¹,

Nam²,

Cho³

et al. 2022

Preprint

View full text Add to dashboard Cite

Existing pipelines of semantic correspondence commonly include extracting highlevel semantic features for the invariance against intra-class variations and background clutters. This architecture, however, inevitably results in a low-resolution matching field that additionally requires an ad-hoc interpolation process as a postprocessing for converting it into a high-resolution one, certainly limiting the overall performance of matching results. To overcome this, inspired by recent success of implicit neural representation, we present a novel method for semantic correspondence, called Neural Matching Field (NeMF). However, complicacy and highdimensionality of a 4D matching field are the major hindrances, which we propose a cost embedding network to process a coarse cost volume to use as a guidance for establishing high-precision matching field through the following fully-connected network. Nevertheless, learning a high-dimensional matching field remains challenging mainly due to computational complexity, since a naïve exhaustive inference would require querying from all pixels in the 4D space to infer pixel-wise correspondences. To overcome this, we propose adequate training and inference procedures, which in the training phase, we randomly sample matching candidates and in the inference phase, we iteratively performs PatchMatch-based inference and coordinate optimization at test time. With these combined, competitive results are attained on several standard benchmarks for semantic correspondence. Code and pre-trained weights are available at https://ku-cvlab.github.io/NeMF/. IntroductionEstablishing visual correspondence across semantically similar images is a fundamental problem in computer vision, which has been facilitating many applications including visual localization [69,38], structure-from-motion [70], image editing [1] and autonomous driving [33]. Unlike traditional dense correspondence tasks [20,23], where visually similar images of the same scene are used as inputs, semantic correspondence problem poses additional challenges due to intra-class appearance and severe geometry variations among object instances [15,16].

show abstract

Section: Matching Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence

Hong¹,

Nam²,

Cho³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We believe that PCF-Net points out a novel direction for solving correspondence problems: learning reliable and geometric-invariant probabilistic coordinate representations. Future research directions include further optimization through cost aggregation [64] and graph matching [65].…”

Section: Discussionmentioning

confidence: 99%

Learning Complete and Discriminative Direction Pattern for Robust Palmprint Recognition

Zhao

Zhang

2021

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

We introduce Probabilistic Coordinate Fields (PCFs), a novel geometric-invariant coordinate representation for image correspondence problems. In contrast to standard Cartesian coordinates, PCFs encode coordinates in correspondence-specific barycentric coordinate systems (BCS) with affine invariance. To know when and where to trust the encoded coordinates, we implement PCFs in a probabilistic network termed PCF-Net, which parameterizes the distribution of coordinate fields as Gaussian mixture models. By jointly optimizing coordinate fields and their confidence conditioned on dense flows, PCF-Net can work with various feature descriptors when quantifying the reliability of PCFs by confidence maps. An interesting observation of this work is that the learned confidence map converges to geometrically coherent and semantically consistent regions, which facilitates robust coordinate representation. By delivering the confident coordinates to keypoint/feature descriptors, we show that PCF-Net can be used as a plug-in to existing correspondence-dependent approaches. Extensive experiments on both indoor and outdoor datasets suggest that accurate geometric invariant coordinates help to achieve the state of the art in several correspondence problems, such as sparse feature matching, dense image registration, camera pose estimation, and consistency filtering. Further, the interpretable confidence map predicted by PCF-Net can also be leveraged to other novel applications from texture transfer to multi-homography classification.

show abstract

“…Due to these unconstrained settings, it should handle the additional challenges of large intra-class variations in appearance and background clutter. Recent deep learning-based matching models (Min et al 2019a;Liu et al 2020;Li et al 2020a;Li et al 2021;Zhao et al 2021;Min et al 2020;Cho et al 2021;Cho, Hong, and Kim 2022), following data-driven approach, were generally trained in a supervised fashion based on datasets (Ham et al ‡ Work done while at NAVER AI Lab. Correspondence to Dongyoon Han: dongyoon.han@navercorp.com.…”

Section: Introductionmentioning

confidence: 99%

College students’ perceptions of AI-based writing learning tools: With a focus on Google Translate, Naver Papago, and Grammarly

Kim¹,

Han²

2021

meeso

View full text Add to dashboard Cite

Image blending aims to combine multiple images seamlessly. It remains challenging for existing 2D-based methods, especially when input images are misaligned due to differences in 3D camera poses and object shapes. To tackle these issues, we propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF), including two key components: 3D-aware alignment and 3D-aware blending. For 3D-aware alignment, we first estimate the camera pose of the reference image with respect to generative NeRFs and then perform 3D local alignment for each part. To further leverage 3D information of the generative NeRF, we propose 3D-aware blending that directly blends images on the NeRF's latent representation space, rather than raw pixel space. Collectively, our method outperforms existing 2D baselines, as validated by extensive quantitative and qualitative evaluations with FFHQ and AFHQ-Cat.

show abstract

CATs++: Boosting Cost Aggregation with Convolutions and Transformers

Cited by 5 publications

References 72 publications

Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence

Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence

Learning Complete and Discriminative Direction Pattern for Robust Palmprint Recognition

College students’ perceptions of AI-based writing learning tools: With a focus on Google Translate, Naver Papago, and Grammarly

Contact Info

Product

Resources

About