We propose for the first time, a novel deep active visuo-tactile cross-modal full-fledged framework for object recognition by autonomous robotic systems. Our proposed network xAVTNet is actively trained with labelled point clouds from a vision sensor with one robot and tested with an active tactile perception strategy to recognise objects never touched before using another robot. We propose a novel visuo-tactile loss (VTLoss) to minimise the discrepancy between the visual and tactile domains for unsupervised domain adaptation. Our framework leverages the strengths of deep neural networks for cross-modal recognition along with active perception and active learning strategies for increased efficiency by minimising redundant data collection. Our method is extensively evaluated on a real robotic system and compared against baselines and other state-of-art approaches. We demonstrate clear outperformance in recognition accuracy compared to the state-of-art visuo-tactile cross-modal recognition method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.