Query2Label: A Simple Transformer Way to Multi-Label Classification

Liu, Shilong; Zhang, Lei; Yang, Xiao; Su, Hang; Zhu, Jun

doi:10.48550/arxiv.2107.10834

Cited by 26 publications

(48 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[48] suggested simple spatial attention scores, and then combined them with class-agnostic average pooling features. [24] presented a pooling transformer with learnable queries for multi-label classification, achieving top results.…”

Section: Equal Contributionmentioning

confidence: 99%

“…GAP was also adopted as a baseline approach for mutli-label classification [3,25,38] Attention-based: Unlike single-label classification, in multi-label classification several objects can appear in the image, in different locations and sizes. Several works [14,24,48] have noticed that the GAP operation, which eliminates the spatial dimension via simple averaging, can be sub-optimal for identifying multiple objects with different sizes. Instead they suggested using attention-based classification heads, which enable more elaborate usage of the spatial data, with improved results.…”

Section: Baseline Classification Headsmentioning

confidence: 99%

“…Among the attention-based classification heads proposed, a simple approach based on a transformer-decoder, similar to the one used by DETR for object detection [5], has achieved top results on multi-label classification [24].…”

Section: Recap -Attention and Transformer-decodermentioning

confidence: 99%

“…On multi-label datasets with small number of classes, such as MS-COCO [23] and Pascal-VOC [13] (80 and 20 classes respectively), transformer-decoder classification head works well, and achieves state-of-the-art results [24], with small additional computational overhead. However, it suffers from a critical drawback -the computational cost is quadratic with the number of classes.…”

Section: Motivationmentioning

confidence: 99%

See 3 more Smart Citations

ML-Decoder: Scalable and Versatile Classification Head

Ridnik¹,

Sharir²,

Ben-Cohen³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we introduce ML-Decoder, a new attentionbased classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to using a larger backbone, ML-Decoder consistently provides a better speed-accuracy trade-off. ML-Decoder is also versatile -it can be used as a drop-in replacement for various classification heads, and generalize to unseen classes when operated with word queries. Novel query augmentations further improve its generalization ability. Using ML-Decoder, we achieve state-of-the-art results on several classification tasks: on MS-COCO multi-label, we reach 91.4% mAP; on NUS-WIDE zero-shot, we reach 31.1% ZSL mAP; and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new top score of 80.7%, without extra data or distillation.

show abstract

Section: Equal Contributionmentioning

confidence: 99%

Section: Baseline Classification Headsmentioning

confidence: 99%

Section: Recap -Attention and Transformer-decodermentioning

confidence: 99%

Section: Motivationmentioning

confidence: 99%

See 2 more Smart Citations

ML-Decoder: Scalable and Versatile Classification Head

Ridnik¹,

Sharir²,

Ben-Cohen³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Evaluation Metrics. Following previous works [15,38,42], beside the mean average precision (mAP), we employ several metrics to better demonstrate the performance of the proposed approach. Under the premise that the predicted label is positive, if the output probability is greater than a threshold (e.g., 0.5), we report the average per-class precision (CP), recall (CR), and F1 score (CF1).…”

Section: Multi-label Classificationmentioning

confidence: 99%

MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

Atito,

Awais,

Farooq

et al. 2021

Preprint

View full text Add to dashboard Cite

Self-supervised pretraining is the method of choice for natural language processing models and is rapidly gaining popularity in many vision tasks. Recently, self-supervised pretraining has shown to outperform supervised pretraining for many downstream vision applications, marking a milestone in the area. This superiority is attributed to the negative impact of incomplete labelling of the training images, which convey multiple concepts, but are annotated using a single dominant class label. Although Self-Supervised Learning (SSL), in principle, is free of this limitation, the choice of pretext task facilitating SSL is perpetuating this shortcoming by driving the learning process towards a single concept output. This study aims to investigate the possibility of modelling all the concepts present in an image without using labels. In this aspect the proposed SSL framework MC-SSL0.0 is a step towards Multi-Concept Self-Supervised Learning (MC-SSL) that goes beyond modelling single dominant label in an image to effectively utilise the information from all the concepts present in it. MC-SSL0.0 consists of two core design concepts, group masked model learning and learning of pseudo-concept for data token using a momentum encoder (teacher-student) framework. The experimental results on multi-label and multi-class image classification downstream tasks demonstrate that MC-SSL0.0 not only surpasses existing SSL methods but also outperforms supervised transfer learning. The source code will be made publicly available for community to train on bigger corpus. 1

show abstract

Brain tumor image segmentation based on prior knowledge via transformer

Li,

Liu,

Nie

et al. 2023

Int J Imaging Syst Tech

View full text Add to dashboard Cite

Many researchers use AI to improve the accuracy of early diagnostic techniques. However, as a result of the tumor's uneven shape, fuzzy borders and too few data, existing tumor segmentation methods do not propose accurate segmentation results. We innovative introduces the prior knowledge learned to filter the noise information and guide the final network to generate a more accurate segmentation model. First, we introduce a classification network with an attention block to highlight the potential location of the brain tumor and also obtain the rough diagnosis result as the prior knowledge. Second, we provide a novel image fusion network consisting of a transformer with cross attention to merge tumor localization information with brain MRI images. Third, we propose a novel multilayer transformer experience information fusion network to combine the classic U‐Net network to handle the guiding of prior knowledge. The higher performance of the suggested method is demonstrated by comparison with contemporary methods.

show abstract

Query2Label: A Simple Transformer Way to Multi-Label Classification

Cited by 26 publications

References 32 publications

ML-Decoder: Scalable and Versatile Classification Head

ML-Decoder: Scalable and Versatile Classification Head

MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

Brain tumor image segmentation based on prior knowledge via transformer

Contact Info

Product

Resources

About