“…For both surface normal estimation and depth completion, the batch size was set to 24. For the second stage, the training hyperparameters of Pointformer and pose and scale estimation were selected following [34], [32]. The learning rate for all loss terms were kept the same during training, {λ rx , λ rz , λ ra , λ t , λ s , λ conx , λ conz } = {8, 8, 4, 8, 8, 1, 1} e −4 .…”