This paper is about robot ego-motion estimation relying solely on acoustic sensing. By equipping a robot with microphones, we investigate the possibility of employing the noise generated by the motors and actuators of the vehicle to estimate its motion. Audio-based odometry is not affected by the scene's appearance, lighting conditions, and structure. This makes sound a compelling auxiliary source of information for ego-motion modelling in environments where more traditional methods, such as those based on visual or laser odometry, are particularly challenged. By leveraging multi-task learning and deep architectures, we provide a regression framework able to estimate the linear and the angular velocity at which the robot has been travelling. Our experimental evaluation conducted on approximately two hours of data collected with an unmanned outdoor field robot demonstrated an absolute error lower than 0.07 m/s and 0.02 rad/s for the linear and angular velocity, respectively. When compared to a baseline approach, making use of single-task learning scheme, our system shows an improvement of up to 26% in the ego-motion estimation.