Yacheng Tan scite author profile

Convolutional neural networks (CNNs) are good at extracting contexture features within certain receptive fields, while transformers can model the global long-range dependency features. By absorbing the advantage of transformer and the merit of CNN, Swin Transformer shows strong feature representation ability. Based on it, we propose a cross-modality fusion model, SwinNet, for RGB-D and RGB-T salient object detection.It is driven by Swin Transformer to extract the hierarchical features, boosted by attention mechanism to bridge the gap between two modalities, and guided by edge information to sharp the contour of salient object. To be specific, two-stream Swin Transformer encoder first extracts multi-modality features, and then spatial alignment and channel re-calibration module is presented to optimize intra-level cross-modality features. To clarify the fuzzy boundary, edge-guided decoder achieves inter-level cross-modality fusion under the guidance of edge features. The proposed model outperforms the state-of-theart models on RGB-D and RGB-T datasets, showing that it provides more insight into the cross-modality complementarity task.https://github.com/liuzywen/SwinNet

show abstract

Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer

Liu

Tan

2022

View full text Add to dashboard Cite

HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection

Tang

Liu

Tan

et al. 2023

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

High-Resolution Transformer (HRFormer) can maintain high-resolution representation and share global receptive fields. It is friendly towards salient object detection (SOD) in which the input and output have the same resolution. However, two critical problems need to be solved for two-modality SOD. One problem is two-modality fusion. The other problem is the HRFormer output's fusion. To address the first problem, a supplementary modality is injected into the primary modality by using global optimization and an attention mechanism to select and purify the modality at the input level. To solve the second problem, a dual-direction short connection fusion module is used to optimize the output features of HRFormer, thereby enhancing the detailed representation of objects at the output level. The proposed model, named HRTransNet, first introduces an auxiliary stream for feature extraction of supplementary modality. Then, features are injected into the primary modality at the beginning of each multi-resolution branch. Next, HRFormer is applied to achieve forwarding propagation. Finally, all the output features with different resolutions are aggregated by intrafeature and inter-feature interactive transformers. Application of the proposed model results in impressive improvement for driving two-modality SOD tasks, e.g., RGB-D, RGB-T, and light field SOD.https://github.com/liuzywen/HRTransNet

show abstract

HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection

Tang¹,

Liu²,

Tan³

et al. 2023

Preprint

View full text Add to dashboard Cite

BGRDNet: RGB-D salient object detection with a bidirectional gated recurrent decoding network

Liu

Wang²,

Tan³

2022

Multimed Tools Appl

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yacheng Tan

SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection

Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer

HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection

HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection

BGRDNet: RGB-D salient object detection with a bidirectional gated recurrent decoding network

Contact Info

Product

Resources

About