2009
DOI: 10.1007/978-3-642-04667-4_2
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Speaker Recognition in a Conversation Scenario

Abstract: As a step toward the design of a robot that can take part to a conversation we propose a robotic system that, taking advantage of multiple perceptual capabilities, actively follows a conversation among several human subjects. The essential idea of our proposal is that the robot system can dynamically change the focus of its attention according to visual or audio stimuli to track the actual speaker throughout the conversation and infer her identity.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(11 citation statements)
references
References 17 publications
0
11
0
Order By: Relevance
“…Speaker Verification While traditionally this task has been addressed relying on Gaussian Mixture Models (e.g. [19,21]), recent advances in machine learning, particularly in the form of deep learning architectures (e.g. [8,9]) have dictated and driven the development of new methods able to achieve great precision, and to overcome the need of defining hand-crafted features.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Speaker Verification While traditionally this task has been addressed relying on Gaussian Mixture Models (e.g. [19,21]), recent advances in machine learning, particularly in the form of deep learning architectures (e.g. [8,9]) have dictated and driven the development of new methods able to achieve great precision, and to overcome the need of defining hand-crafted features.…”
Section: Related Workmentioning
confidence: 99%
“…Speaker Localisation Speaker or, more generally, sound source localisation, has followed a similar pattern, and more traditional geometrical methods [19,22] have been now superseded by deep learning approaches, such as [7,14]. Both those studies rely on cross-correlation information to train CNN-based models to perform localisation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For long time, robot audition has mainly concerned the development of human-robot interaction frameworks (e.g. [6]). More recently, the robotics community has started investigating auditory perception in a wider perspective.…”
Section: Related Workmentioning
confidence: 99%
“…In audio-based classification, Mel-frequency cepstrum coefficients (MFCCs) [6] have been traditionally used as feature representations of the signals. However, recent studies proved that the performance of classification systems relying on MFCCs is greatly reduced in the presence of noise [7,19].…”
Section: Feature Representationmentioning
confidence: 99%