2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00615
|View full text |Cite
|
Sign up to set email alerts
|

COTR: Correspondence Transformer for Matching Across Images

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
56
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 157 publications
(58 citation statements)
references
References 46 publications
0
56
0
Order By: Relevance
“…For those works addressing visual correspondence, LoFTR [27] uses a cross and self-attention module to refine the feature maps conditioned on both input images, and formulate the hand-crafted aggregation layer with dual-softmax [19], [57], and Optimal Transport [25] to infer correspondences. In another work, COTR [58] takes coordinates as input and addresses dense correspondence tasks without the use of a correlation map. Unlike these, for the first time, we propose a transformer-based cost aggregation module.…”
Section: Transformers In Visionmentioning
confidence: 99%
“…For those works addressing visual correspondence, LoFTR [27] uses a cross and self-attention module to refine the feature maps conditioned on both input images, and formulate the hand-crafted aggregation layer with dual-softmax [19], [57], and Optimal Transport [25] to infer correspondences. In another work, COTR [58] takes coordinates as input and addresses dense correspondence tasks without the use of a correlation map. Unlike these, for the first time, we propose a transformer-based cost aggregation module.…”
Section: Transformers In Visionmentioning
confidence: 99%
“…[1] propose a sparse correspondence method for inter-class scenarios; leveraging pre-trained CNN features. Recent works employ transformers for dense correspondence in intra-class pairs [8,47,22]. However, those methods fail to find meaningful correspondences under significant pose, scale and appearance changes.…”
Section: Related Workmentioning
confidence: 99%
“…They work well for continuous frames but are inadequate to handle image pairs with large displacements. Very recently, the concurrent works [39,16] involve global context between matches by using transformers [42] which achieve great success in many NLP and vision tasks [11,6,51] based on the attention mechanism. Different from these works, we propose to adopt sparse correspondence as prior and design lightweighted network layers to efficiently propagate the contextual information to all image points, allowing predicting dense correspondence for arbitrary points.…”
Section: Related Workmentioning
confidence: 99%
“…But the key difference is that we propose a more sophisticated graph to model multi-level contexts using sparse correspondence as prior and develop a general architecture to infuse the contextual information into local features. In our graph-structured network, the message passing layers are implemented with the attention-based mechanism of Transformer [42], which is also used by some recent works [39,16].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation