“…Depending on the number of input cameras, 3D human pose estimation methods are divided into a monocular camera for taking single-view video [2,23,31,14,21,10,22,16,38] and multiple cameras for taking multi-view videos synchronously [3,13,4,32,11,26,7,36,39,35].…”