Visual attention network

Guo, Minghao; Lu, Chin-Pi; Liu, Zheng-Ning; Cheng, Ming–Ming; Hu, Shi‐Min

doi:10.1007/s41095-023-0364-2

Cited by 196 publications

(55 citation statements)

References 106 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To recognize images accurately, researchers have proposed various architectures and techniques for CNNs, such as using multiple layers, 23 skip connections, 24 dense connections, 25 squeeze and excitation steps, 32 attention mechanisms, 33 and large kernel attention. 34 To remedy the limitations of the local inductive bias in modeling the global representations, transformer-based networks (e.g., CvT-13, 28 Swin Transformer, 31 ViT-B/16, 29 PVT, 30 PoolFormer-S12, 35 and BEiT-B 36 ) are proposed to model the long-range dependencies in feature space via a self-attention mechanism. However, the aforementioned networks are prone to overfitting when trained from scratch on few-shot samples.…”

Section: Image Recognition Techniquesmentioning

confidence: 99%

“…We used ViT-B/16-B/32 4 as our CLIP network architecture. We compared TEG to the SOTA image recognition methods in various few-shot settings, including CNN-based networks (VGG-11, 23 VGG-19, 23 ConvNeXt-T, 64 and VAN-B2 34 ), Transformer-based networks (ViT-B/16, 29 CvT-13, 28 Swin Transformer 31 (Swin-T), PoolFormer-S12, 35 BEiT-B, 36 and EfficientFormer-L1 65 ), as well as CLIP-based fine-tuning methods (zeroshot CLIP, 4 linear-probe CLIP, 4 CoOp, 5 and WiSE-FT (linear classifier, α ¼ 0.5) 48 ). All the compared models were implemented using the PyTorch framework.…”

Section: Vegetablementioning

confidence: 99%

“…Figure 6 compares the results of TEG to EfficientFormer-L1, 65 VAN-B2, 34 and CoOp 5 on recognition accuracy for each category in Theme25 dataset with various few-shot settings. Overall, our TEG outperforms the compared recognition algorithms.…”

Section: Tablementioning

confidence: 99%

See 2 more Smart Citations

TEG: image theme recognition using text-embedding-guided few-shot adaptation

Wang,

Lu,

Wang

et al. 2024

J. Electron. Imag.

View full text Add to dashboard Cite

Grouping images into different themes is a challenging task in photo book curation. Unlike image object recognition, image theme recognition focuses on the understanding of the main subject or overall meaning conveyed by an image. However, it is challenging to achieve satisfactory performance using existing general image recognition methods. In this work, we aim to solve the image theme recognition task with few-shot training samples using pre-trained contrastive language-image models. A text-prompt-guided few-shot image adaptation framework is proposed, which incorporates a text-embedding-guided classifier and an auxiliary classification loss to exploit embedded visual and text features, stabilize the network training, and enhance recognition performance. We also present an annotated dataset Theme25 for studying image theme recognition. We conducted experiments on our Theme25 dataset as well as the publicly available CIFAR100 and ImageNet datasets to demonstrate the superiority of our method over the compared stateof-the-art methods.

show abstract

Section: Image Recognition Techniquesmentioning

confidence: 99%

Section: Vegetablementioning

confidence: 99%

See 1 more Smart Citation

TEG: image theme recognition using text-embedding-guided few-shot adaptation

Wang,

Lu,

Wang

et al. 2024

J. Electron. Imag.

View full text Add to dashboard Cite

show abstract

“…We employ contemporary strategies that synergize with DenseNets as well. Our methodology eventually exceeds strong modern architectures [21,25,42,45,57,97] and some milestones like Swin Transformer [47], ConvNeXt [48], and DeiT-III [71] in performance trade-offs on ImageNet-1K [59]. Our models demonstrate competitive performance on downstream tasks such as ADE20K semantic segmentation and COCO object detection/instance segmentation.…”

Section: Introductionmentioning

confidence: 97%

College students’ perceptions of AI-based writing learning tools: With a focus on Google Translate, Naver Papago, and Grammarly

Kim¹,

Han²

2021

meeso

View full text Add to dashboard Cite

Image blending aims to combine multiple images seamlessly. It remains challenging for existing 2D-based methods, especially when input images are misaligned due to differences in 3D camera poses and object shapes. To tackle these issues, we propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF), including two key components: 3D-aware alignment and 3D-aware blending. For 3D-aware alignment, we first estimate the camera pose of the reference image with respect to generative NeRFs and then perform 3D local alignment for each part. To further leverage 3D information of the generative NeRF, we propose 3D-aware blending that directly blends images on the NeRF's latent representation space, rather than raw pixel space. Collectively, our method outperforms existing 2D baselines, as validated by extensive quantitative and qualitative evaluations with FFHQ and AFHQ-Cat.

show abstract

“…The attention mechanism plays a significant role in various domains of machine learning, including Natural Language Processing (NLP) and Computer Vision (CV), 16–23 among others. Broadly speaking, attention can be regarded as a tool for directing available processing resources towards the most informative elements of an input signal 24,25 .…”

Section: Introductionmentioning

confidence: 99%

AMter: An end‐to‐end model for transcriptional terminators prediction by extracting semantic feature automatically based on attention mechanism

Zhang,

Li,

et al. 2024

Concurrency and Computation

View full text Add to dashboard Cite

SummaryThe Terminator, a specific DNA sequence, provides the transcriptional termination signal to RNA polymerase, making it a critical aspect of transcriptional regulation. This article proposes AMter, the first end‐to‐end model designed for predicting transcriptional terminators, leveraging attention mechanisms. In AMter, rather than manual feature engineering, two distinct modules based on attention mechanism, known as Frequency‐Attention and Allkmer‐Attention, are employed to automatically learn efficient features. Frequency‐Attention generates informative features by autonomously determining the significance of various frequency features, while Allkmer‐Attention aims to capture the relationships among all k‐mers within a DNA sequence. Features generated by Frequency‐Attention and Allkmer‐Attention demonstrate high informativeness and discriminative capacity, allowing precise discrimination of whether a DNA sequence is a terminator through a simple prediction network. The results of the 5‐fold cross‐validation test indicate the remarkable achievement of our proposed method, attaining 100% accuracy in both the training and validation datasets. Furthermore, AMter demonstrates outstanding prediction accuracy on two independent datasets, with 100% accuracy for Escherichia coli and 99.30% for Bacillus subtilis, marking a significant 94.4% relative improvement over prior methods. Experimental results conclusively demonstrate that AMter surpasses existing approaches, establishing a new state‐of‐the‐art in transcriptional terminator prediction.

show abstract

Visual attention network

Cited by 196 publications

References 106 publications

TEG: image theme recognition using text-embedding-guided few-shot adaptation

TEG: image theme recognition using text-embedding-guided few-shot adaptation

College students’ perceptions of AI-based writing learning tools: With a focus on Google Translate, Naver Papago, and Grammarly

AMter: An end‐to‐end model for transcriptional terminators prediction by extracting semantic feature automatically based on attention mechanism

Contact Info

Product

Resources

About