Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Tang, Chengxiang; Zhao, Yucheng; Wang, Guangting; Luo, Chong; Xie, Wenxuan; Zeng, Wenjun

doi:10.1609/aaai.v36i2.20133

Cited by 41 publications

(15 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Can we find such a PWLNN by explicitly seeking a shallow PWLNN or implicitly regularizing the learning of a PWL-DNN? What are the differences and relations between PWLNNs and other kinds of NNs that address locally-dominant features 195 ?…”

Section: Discussionmentioning

confidence: 99%

Piecewise linear neural networks and deep learning

Tao

Huang

et al. 2022

Nat Rev Methods Primers

View full text Add to dashboard Cite

show abstract

Section: Discussionmentioning

confidence: 99%

Piecewise linear neural networks and deep learning

Tao

Huang

et al. 2022

Nat Rev Methods Primers

View full text Add to dashboard Cite

show abstract

“…The module first extracts the long-term context dependencies of each modality using LSTM, and then further extracts the temporal vectors of each modality. After that, the sparse MLP [57] is used to mix the information of the temporal importance of the two modalities to obtain the attention vector with interaction information. Finally, the attention vector with interaction information is used to guide multimodal feature fusion.…”

Section: Tamf Modulementioning

confidence: 99%

“…2) The temporal features of the two modalities are concatenated to obtain the concatenated vector Concat vector. To interact with the timing information of the two modalities, we mix the information in the vertical and horizontal directions of the concatenated vector Concat vector through weight sharing and sparse connection in the sparse MLP [57], respectively, to obtain the mixed attention vector x mix.…”

Section: Tamf Modulementioning

confidence: 99%

TAMFN: Time-Aware Attention Multimodal Fusion Network for Depression Detection

Zhou

Liu

Shangguan

et al. 2023

IEEE Trans. Neural Syst. Rehabil. Eng.

View full text Add to dashboard Cite

In recent years, with the widespread popularity of the Internet, social media has become an indispensable part of people's lives. People regard online social media as an essential tool for interaction and communication. Due to the convenience of data acquisition from social media, mental health research on social media has received a lot of attention. The early detection of psychological disorder based on social media can help prevent further deterioration in at-risk people. In this paper, depression detection is performed based on non-verbal (acoustics and visual) behaviors of vlog. We propose a time-aware attentionbased multimodal fusion depression detection network (TAMFN) to mine and fuse the multimodal features fully. The TAMFN model is constructed by a temporal convolutional network with the global information (GTCN), an intermodal feature extraction (IFE) module, and a time-aware attention multimodal fusion (TAMF) module. The GTCN model captures more temporal behavior information by combining local and global temporal information. The IFE module extracts the early interaction information between modalities to enrich the feature representation. The TAMF module guides the multimodal feature fusion by mining the temporal importance between different modalities. Our experiments are carried out on D-Vlog dataset, and the comparative experimental results report that our proposed TAMFN outperforms all benchmark models, indicating the effectiveness of the proposed TAMFN model.

show abstract

“…sMLP block. Chuanxin Tang et al proposed Sparse MLP (sMLP) [21] based on the MLP-based vision model, replacing the MLP module in the token-mixing step with a new sMLP module. For a 2D image, sMLP applies 1D MLP along the image height and width, so the parameters are shared between rows or columns.…”

Section: Gpa-tunetmentioning

confidence: 99%

“…In order to solve these problems, we design a new attention mechanism GPA and cite the Sparse-MLP (sMLP) proposed by Chuanxin Tang et al [21]. We combine GPA with Transformer as encoder.…”

Section: Introductionmentioning

confidence: 99%

GPA-TUNet: Transformer and GPA Attention Co-Encoder for Medical Image Segmentation

Wang

2022

Preprint

View full text Add to dashboard Cite

U-Net has become baseline standard in the medical image segmentation tasks, but it has limitations in explicitly modeling long-term dependencies. Transformer has the ability to capture long-term relevance through its internal self-attention. However, Transformer is committed to modeling the correlation of all elements, but its awareness of local foreground information is not significant. Since medical images are often presented as regional blocks, local information is equally important. In this paper, we propose the GPA-TUNet by considering local and global information synthetically. Specifically, we propose a new attention mechanism to highlight local foreground information, called group parallel axial attention (GPA). Furthermore, we effectively combine GPA with Transformer in encoder part of model. It can not only highlight the foreground information of sample, but also reduce the negative influence of background information on the segmentation results. Meanwhile, we introduce the sMLP block to improve the global modeling capability of network. Sparse connectivity and weight sharing are well achieved by applying it. Extensive experiments on public datasets confirm the excellent performance of our proposed GPA-TUNet. In particular, on Synapse and ACDC datasets, mean DSC reached 80.37% and 90.37% respectively, mean HD95 reached 20.55% and 1.23% respectively.

show abstract

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Cited by 41 publications

References 23 publications

Piecewise linear neural networks and deep learning

Piecewise linear neural networks and deep learning

TAMFN: Time-Aware Attention Multimodal Fusion Network for Depression Detection

GPA-TUNet: Transformer and GPA Attention Co-Encoder for Medical Image Segmentation

Contact Info

Product

Resources

About