“…Teleoperation [39], as a direct means to acquire human demonstrations for teaching robots, has been a powerful paradigm to approach this goal [22,11,64,17,34,6,19,5,38,51]. Compared to gripper-based manipulators, teleoper-ating dexterous hand-arm systems poses unprecedented challenges and often requires specialized apparatus that comes with high costs and setup efforts, such as Virtual Reality (VR) devices [4,17,15], wearable gloves [29,30], handheld controller [45,46,20], haptic sensors [12,23,50,53], or motion capture trackers [65]. Fortunately, recent developments in vision-based teleoperation [2,24,16,26,42,27,21,22,3] have provided a low-cost and more generalizable alternative for teleoperating dexterous robot systems.…”