Transformer has shown its effectiveness and advantage in many computer vision tasks, for example, image classification and object re‐identification (ReID). However, existing vision transformers are stacked layer by layer, lacking direct information exchange among every layer. Inspired by DenseNet, we propose a dense transformer framework (termed Denseformer) that connects each layer to every other layer through class tokens. We demonstrate that Denseformer can consistently achieve better performance on person ReID tasks across datasets (Market‐1501, DukeMTMC, MSMT17, and Occluded‐Duke), only at a negligible increase of computation. We show that Denseformer has several compelling advantages: it pays more attention to the main parts of human bodies and obtains discriminative global features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.