“…While instancelevel 3D object pose estimation has long been studied in both robotic and vision communities [19,1,21,42,48,43,60,47,37,24], class-level pose estimation has developed more recently thanks to learning-based methods [45,52,51,34,23,55,14,13,56,66,50]. These methods can be roughly divided into two categories: direct pose estimation methods that regress 3D orientations directly [52,45,34,55,63], and keypoint-based methods that predict 2D locations of 3D keypoints [14,13,56,66,50]. However, annotating 3D poses for objects in the wild is a tedious process of searching best-matching CAD models and aligning them to images [59,58].…”