“…Recent works have shown promising results on grasping transparent objects by completing the missing depth values followed by the use of a geometry-based grasp engine [1], [2], [3], transfer learning from RGB-based grasping neural networks [4], light-field feature learning [5], or domainrandomized depth noise simulation [6]. For more advanced manipulation tasks such as rigid body pick-and-place or liquid pouring, geometry-based estimations, such as symmetrical axes, edges [7] or object poses [8], [9], [6], are required to model the manipulation trajectories. Instance-level transparent object poses could be estimated from keypoints on stereo RGB images [10], [11], a light-field camera [12], [13], or directly from a single RGB-D image [9] with support plane assumptions.…”