“…Supervised methods [35,36,48,53,57] employ various loss functions [10,26,28,36,47,53] to measure the discrepancy between output depth and ground truth. However, models fail to acquire sufficient structural information from sparse annotations of driving scenes.…”