Multimodal Speaker Recognition in a Conversation Scenario

Marchegiani, Letizia; Pirri, Fiora; Pizzoli, Matia

doi:10.1007/978-3-642-04667-4_2

Cited by 6 publications

(11 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speaker Verification While traditionally this task has been addressed relying on Gaussian Mixture Models (e.g. [19,21]), recent advances in machine learning, particularly in the form of deep learning architectures (e.g. [8,9]) have dictated and driven the development of new methods able to achieve great precision, and to overcome the need of defining hand-crafted features.…”

Section: Related Workmentioning

confidence: 99%

“…Speaker Localisation Speaker or, more generally, sound source localisation, has followed a similar pattern, and more traditional geometrical methods [19,22] have been now superseded by deep learning approaches, such as [7,14]. Both those studies rely on cross-correlation information to train CNN-based models to perform localisation.…”

Section: Related Workmentioning

confidence: 99%

“…Literature in robotics has provided us with several speaker detection and recognition frameworks, most of which rely on face and voice characteristics (see [19,26] among others). Despite the accuracy of such frameworks, situations where noise might play a crucial role and heavily compromise the quality of the sound perceived by the robot have not been yet investigated.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

No Need to Scream: Robust Sound-Based Speaker Localisation in Challenging Scenarios

2019

View full text Add to dashboard Cite

This paper is about speaker verification and horizontal localisation in the presence of conspicuous noise. Specifically, we are interested in enabling a mobile robot to robustly and accurately spot the presence of a target speaker and estimate his/her position in challenging acoustic scenarios. While several solutions to both tasks have been proposed in the literature, little attention has been devoted to the development of systems able to function in harsh noisy conditions. To address these shortcomings, in this work we follow a purely data-driven approach based on deep learning architectures which, by not requiring any knowledge either on the nature of the masking noise or on the structure and acoustics of the operation environment, it is able to reliably act in previously unexplored acoustic scenes. Our experimental evaluation, relying on data collected in real environments with a robotic platform, demonstrates that our framework is able to achieve high performance both in the verification and localisation tasks, despite the presence of copious noise.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

No Need to Scream: Robust Sound-Based Speaker Localisation in Challenging Scenarios

2019

View full text Add to dashboard Cite

show abstract

“…For long time, robot audition has mainly concerned the development of human-robot interaction frameworks (e.g. [6]). More recently, the robotics community has started investigating auditory perception in a wider perspective.…”

Section: Related Workmentioning

confidence: 99%

“…In audio-based classification, Mel-frequency cepstrum coefficients (MFCCs) [6] have been traditionally used as feature representations of the signals. However, recent studies proved that the performance of classification systems relying on MFCCs is greatly reduced in the presence of noise [7,19].…”

Section: Feature Representationmentioning

confidence: 99%

Learning to Listen to Your Ego-(motion): Metric Motion Estimation from Auditory Signals

Marchegiani

Newman

2018

Towards Autonomous Robotic Systems

View full text Add to dashboard Cite

This paper is about robot ego-motion estimation relying solely on acoustic sensing. By equipping a robot with microphones, we investigate the possibility of employing the noise generated by the motors and actuators of the vehicle to estimate its motion. Audio-based odometry is not affected by the scene's appearance, lighting conditions, and structure. This makes sound a compelling auxiliary source of information for ego-motion modelling in environments where more traditional methods, such as those based on visual or laser odometry, are particularly challenged. By leveraging multi-task learning and deep architectures, we provide a regression framework able to estimate the linear and the angular velocity at which the robot has been travelling. Our experimental evaluation conducted on approximately two hours of data collected with an unmanned outdoor field robot demonstrated an absolute error lower than 0.07 m/s and 0.02 rad/s for the linear and angular velocity, respectively. When compared to a baseline approach, making use of single-task learning scheme, our system shows an improvement of up to 26% in the ego-motion estimation.

show abstract