2022
DOI: 10.1007/978-3-031-19790-1_40
|View full text |Cite
|
Sign up to set email alerts
|

FlowFormer: A Transformer Architecture for Optical Flow

Abstract: This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoEncoding (MCVA) for pretraining it to tackle the problem of optical flow estimation. FlowFormer tokenizes the 4D cost-volume built from the source-target image pair and iteratively refines flow estimation with a cost-volume encoder-decoder architecture. The cost-volume encoder derives a cost memory with alternate-group transformer (AGT) layers in a latent space and the decoder recurrently dec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
41
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 85 publications
(42 citation statements)
references
References 81 publications
0
41
0
Order By: Relevance
“…• We demonstrate the effectiveness of DistractFlow in supervised [6,14,30] and semi-supervised settings, showing that DistractFlow outperforms the very recent FlowSupervisor [9] that require additional in-domain unlabeled data.…”
Section: Introductionmentioning
confidence: 91%
See 3 more Smart Citations
“…• We demonstrate the effectiveness of DistractFlow in supervised [6,14,30] and semi-supervised settings, showing that DistractFlow outperforms the very recent FlowSupervisor [9] that require additional in-domain unlabeled data.…”
Section: Introductionmentioning
confidence: 91%
“…Optical Flow Estimation: Several deep architectures have been proposed for optical flow [4,8,23,29,30,38]. Among these, Recurrent All Pairs Field Transforms (RAFT) [30] have shown significant performance improvement over previous methods, inspiring many subsequent works [6,14,26,27,35]. Following the structure of RAFT architecture, complementary studies [12,14,33,35,39] proposed advancements on feature extraction, 4D correlation volume, recurrent update blocks, and more recently, transformer extensions [6,39].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Attempts have been made to develop effective VFI methods [1,2,6,[9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]. Especially, with the advances in optical flow estimation [28][29][30][31][32][33][34][35][36][37], motion-based VFI methods provide remarkable performances. But, VFI for high-resolution videos, e.g.…”
Section: Introductionmentioning
confidence: 99%