“…Token adoptation in vision tasks: At the moment, tokenbased models are widely applied in almost all domains in vision, including classification [22,46,65], object detection [6,16,90], segmentation [23,74], image generation [5,20,24,38,43], video understanding [1,2,4,9,25,28,41,45,47,49,56,85], dense prediction [54,75], point clouds processing [30,88], reinforcement learning [10,37] and tracking [60].…”