2023
DOI: 10.1016/j.neucom.2022.10.081
|View full text |Cite
|
Sign up to set email alerts
|

Transformers and CNNs fusion network for salient object detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…Y Trans f ormer = f latten Trans f ormer f eatur_shape (18) Then, the Y Transformer is passed onto the deep neural network classifier. Moreover, the required optimum parameters for the transformer feature extractor of the Conv-ViT network are summarized in Table 1.…”
Section: Vision Transformermentioning
confidence: 99%
See 2 more Smart Citations
“…Y Trans f ormer = f latten Trans f ormer f eatur_shape (18) Then, the Y Transformer is passed onto the deep neural network classifier. Moreover, the required optimum parameters for the transformer feature extractor of the Conv-ViT network are summarized in Table 1.…”
Section: Vision Transformermentioning
confidence: 99%
“…This ViT-ARN was trained using a total of 858 and 1600 videos from two datasets and was evaluated based on two datasets-LAD-2000 and UCF-Crime datasets where the proposed framework outperformed other state-of-the-art approaches with an increased accuracy of 10.14% and 3% in these two datasets, respectively. In a separate study, Yao et al [18] proposed a fusion of transformers and CNN for salient object detection (SOD) where the transformer captured the long-distance pixel relationship, and later, a CNN was applied, which extracted the fine-grained local details. This incorporation resolved the problem of using a CNN-based network and showed equal effectivity for both RGB and RGB-D (RGB and depth) SOD.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This evolution can be traced back to AlexNet [17], which introduced a fundamental CNN architecture that achieved groundbreaking results on challenging datasets. Inspired by AlexNet, researchers began applying convolutional neural networks to various deep learning tasks, establishing them as one of the prevailing approaches in contemporary research [18][19][20][21][22].Initially, the approach involved using convolutional neural networks in a sliding window model [23]. However, due to computational complexity limitations, the sliding window approach gradually gave way to the region proposal method [24].…”
Section: Related Workmentioning
confidence: 99%
“…S ALIENT object detection (SOD) aims at detecting the most visually attractive objects from the inputs [1], [2], which has been widely performed on many computer vision tasks, such as tracking [3], segmentation [4], [5], action recognition [6], camouflaged object detection [7], and so on.…”
Section: Introductionmentioning
confidence: 99%