2022
DOI: 10.1007/978-3-031-19818-2_7
|View full text |Cite
|
Sign up to set email alerts
|

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(24 citation statements)
references
References 59 publications
0
11
0
Order By: Relevance
“…The cosine similarity between each point in query feature map and support features is defined as below: CS(xq,xs)badbreak=i=1Fsb,l,oReLU(xqT·xs,ixqxs,i)Fsb,l,o$$\begin{equation}CS({x}_q,{x}_s) = \frac{{\sum_{i = 1}^{\left| {F_s^{b,l,o}} \right|} {{\mathop{\rm Re}\nolimits} LU(\frac{{x_q^T \cdot {x}_{s,i}}}{{\left\| {{x}_q} \right\|\left\| {{x}_{s,i}} \right\|}})} }}{{\left| {F_s^{b,l,o}} \right|}}\end{equation}$$wherexq${x}_q$ is a vector of a point inFqb,l$F_q^{b,l}$, xs,i${x}_{s,i}$is the i ‐th value in xs=Fsb,l,o${x}_s = F_s^{b,l,o}$, and |Fsb,l,o|$| {F_s^{b,l,o}} |$ is the number of elements in Fsb,l,o$F_s^{b,l,o}$. As done in [30, 46], we use the ReLU() function to make the network focus only on how the query point is similar to the specific class object in the supporting image, rather than how they are different. The CS() describes the similarity between each point in query feature map and all feature points in the support set.…”
Section: Our Methods Overviewmentioning
confidence: 99%
See 1 more Smart Citation
“…The cosine similarity between each point in query feature map and support features is defined as below: CS(xq,xs)badbreak=i=1Fsb,l,oReLU(xqT·xs,ixqxs,i)Fsb,l,o$$\begin{equation}CS({x}_q,{x}_s) = \frac{{\sum_{i = 1}^{\left| {F_s^{b,l,o}} \right|} {{\mathop{\rm Re}\nolimits} LU(\frac{{x_q^T \cdot {x}_{s,i}}}{{\left\| {{x}_q} \right\|\left\| {{x}_{s,i}} \right\|}})} }}{{\left| {F_s^{b,l,o}} \right|}}\end{equation}$$wherexq${x}_q$ is a vector of a point inFqb,l$F_q^{b,l}$, xs,i${x}_{s,i}$is the i ‐th value in xs=Fsb,l,o${x}_s = F_s^{b,l,o}$, and |Fsb,l,o|$| {F_s^{b,l,o}} |$ is the number of elements in Fsb,l,o$F_s^{b,l,o}$. As done in [30, 46], we use the ReLU() function to make the network focus only on how the query point is similar to the specific class object in the supporting image, rather than how they are different. The CS() describes the similarity between each point in query feature map and all feature points in the support set.…”
Section: Our Methods Overviewmentioning
confidence: 99%
“…Recently, refs. [30, 46] used 4D convolution to establish the hyperrelation between multi‐layer features, but 4D convolution has high spatial complexity and time complexity. In this paper, we use multi‐similarity to build a more robust semantic relationship between support and query images.…”
Section: Related Workmentioning
confidence: 99%
“…Existing FSS approaches follow the metric learning framework, including parameter-based, prototype-based, and hybrid methods. The parameter-based methods [47]- [51] compare the pairwise distance between query and support by a parameter-model, such as linear classification [47], 4D-convolution [50], and gaussian processes [49]. CWT [47] designed a classifier weight transformer to tune the weights of the transformer online with a support-set trained linear classifier which simplifies the meta-learning task.…”
Section: Few-shot Semantic Segmentationmentioning
confidence: 99%
“…Following the similar idea, HSNet [11] computes the pixel-wise correlation between support-query pairs and enhances the correlation matrix with a 4D convolutional operation. VAT [12] extends the correlation enhancement module from a 4D convolutional network to a 4D swin transformer [41]. Though the pixel-wise correlation could retain the most abundant category information, these approaches might result in unnecessary information loss as they ignore the support background.…”
Section: B Few-shot Segmentationmentioning
confidence: 99%
“…To address this issue, some recent works [11], [12] explore the pixel-wise correlations between the query images and the foreground support features and have shown some advantages against the prototype-based approaches. However, these approaches ignore the backgrounds of the support images.…”
Section: Introductionmentioning
confidence: 99%