“…As a general trend, ever deeper and more sophisticated neural network architectures are dominating hand pose estimation methods. They are highly accurate [4,5,9,11,12,20,23,31,50] when trained with large amounts of labeled samples. However, given that accurate 3D annotations are extremely difficult to obtain, a number of works approach the problem with deep generative models to leverage unlabelled data [1,3,21,28,29,36,49].…”