“…While deep learning has demonstrated great success in various application domains [Russakovsky et al, 2015, Silver et al, 2016, large-scale annotated data for supervision inevitably becomes the bottleneck. Many works thus explore self-supervised learning via active perception [Wilkes & Tsotsos, 1992], interactive perception [Bohg et al, 2017], or interactive exploration [Wyatt et al, 2011] to learn visual representations [Fang et al, 2020, Jayaraman & Grauman, 2018, Weihs et al, 2019, Zakka et al, 2020, objects and poses [Caicedo & Lazebnik, 2015, Chaplot et al, 2020b, Choi et al, 2021, segmentation and parts [Eitel et al, 2019, Gadre et al, 2021, Katz & Brock, 2008, Kenney et al, 2009, Lohmann et al, 2020, Pathak et al, 2018, Van Hoof et al, 2014, physics and dynamics [Agrawal et al, 2016, Ehsani et al, 2020, Janner et al, 2018, Li et al, 2016, Lohmann et al, 2020, Mottaghi et al, 2016, Wu et al, 2015, manipulation skills [Agrawal et al, 2016, Batra et al, 2020, Zeng et al, 2018, navigation policies [Anderson et al, 2018, Chaplot et al, 2020a, Ramakrishnan et al, 2021, etc. In this work, we design interactive policies to explore novel 3D indoor rooms and learn our newly proposed inter-object functional relationships.…”