Capturing accurate 3D human performances in global space from a static monocular video is an ill-posed problem. It requires solving various depth ambiguities and information about the camera's intrinsics and extrinsics. Therefore, most methods either learn on given cameras or require to know the camera's parameters. We instead show that a camera's extrinsics and intrinsics can be regressed jointly with human's position in global space, joint angles and body shape only from long sequences of 2D motion estimates. We exploit a static camera's constant parameters by training a model that can be applied to sequences with arbitrary length with only a single forward pass while allowing full bidirectional information flow. We show that full temporal information flow is especially necessary when improving consistency through an adversarial network. Our training dataset is exclusively synthetic, and no domain adaptation is used. We achieve one of the best Human3.6M joint's error performances for models that do not use the Human3.6M training data.