“…Multimodal Reconstruction Different modalities have been exploited for the purpose of 3D sensing and reconstruction, include RF based [19,25,26,77,78], inertial based [53,72], and acoustic based [7,10,14,15,52,70,75]. Various applications including self-driving car [19], robot manipulation and grasping [40,65,66,68], simultaneous localiza-tion and mapping (SLAM) [1,12,51,54,57] benefited from multimodal reconstruction. Audio, given its ambient nature, has attracted unique attention in multimodal machine learning [3,17,32,39,41,48].…”