2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00356
|View full text |Cite
|
Sign up to set email alerts
|

Local Relation Networks for Image Recognition

Abstract: The convolution layer has been the dominant feature extractor in computer vision for years. However, the spatial aggregation in convolution is basically a pattern matching process that applies fixed filters which are inefficient at modeling visual elements with varying spatial distributions. This paper presents a new image feature extractor, called the local relation layer, that adaptively determines aggregation weights based on the compositional relationship of local pixel pairs. With this relational approach… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
279
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 482 publications
(284 citation statements)
references
References 27 publications
0
279
0
Order By: Relevance
“…Especially, the [24] is been seen as the first to use the attention mechanism to capture the global dependencies of inputs. Recently, the attention mechanism is popular and widely used in computer vision tasks [25][26][27][28][29][30]. The attention mechanism can simulate the human visual system and focus on the areas of interest.…”
Section: B Attention Mechanismmentioning
confidence: 99%
“…Especially, the [24] is been seen as the first to use the attention mechanism to capture the global dependencies of inputs. Recently, the attention mechanism is popular and widely used in computer vision tasks [25][26][27][28][29][30]. The attention mechanism can simulate the human visual system and focus on the areas of interest.…”
Section: B Attention Mechanismmentioning
confidence: 99%
“…Recently, relation network have been successfully applied in the fields of natural language processing [35,40], few-shot learning [38], object detection [18,19,46], action recognition [36] and multiobject tracking [48]. Since there is not any excessive assumption about the data forms, relation network can be widely utilized to capture long-range, non-grid or differently distributed dependencies between data, such as word-word [40], pixel-pixel [19,46], object-pixel [13] and object-object [18,48]. However, these data is hard to be modeled by regular convolutions or sequential networks.…”
Section: Relation Networkmentioning
confidence: 99%
“…A relation network [35] is firstly used as a general solution to capture the core common properties of relational reasoning. Furthermore, RN [38] is used to learn a distance metric to compare few-numbers of images within episodes, while Han et al [18,19] proposed an object-object relation module and local relation networks to adaptively aggregate weight based on the content-aware relationship of elements. In order to capture heterogeneous dependencies, STRN [48] developed a spatial-temporal relation network to model the topological connection of objects in the spatial domain and performs aggregation over the temporal domains.…”
Section: Relation Networkmentioning
confidence: 99%
“…All previous works try to find discriminative regions/patches from high-level feature (HLF) maps directly while neglect the fact that HLF maps are derived based on the spatial aggregation of convolution which is basically a pattern matching process that applies fixed filters [15,24,25]. This will inherently cause certain response differences for semantically similar image parts with spatial variations.…”
Section: Introductionmentioning
confidence: 99%