2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS) 2010
DOI: 10.1109/btas.2010.5634477
|View full text |Cite
|
Sign up to set email alerts
|

Introducing crossmodal biometrics: Person identification from distinct audio & visual streams

Abstract: Abstract-Person identification using audio or visual biometrics is a well-studied problem in pattern recognition. In this scenario, both training and testing are done on the same modalities. However, there can be situations where this condition is not valid, i.e. training and testing has to be done on different modalities. This could arise, for example, in covert surveillance. Is there any person specific information common to both the audio and visual (video-only) modalities which could be exploited to identi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…Le and Odobez [28] use transfer learning from face embeddings to try and improve speaker diarisation results. The only attempt we can find to solve a similar task to the one proposed here (but only for videos, and not still face images) is by [38]. This work seeks to map a statistical model of the features in one modality to a statistical model of the features in another modality.…”
Section: Related Workmentioning
confidence: 99%
“…Le and Odobez [28] use transfer learning from face embeddings to try and improve speaker diarisation results. The only attempt we can find to solve a similar task to the one proposed here (but only for videos, and not still face images) is by [38]. This work seeks to map a statistical model of the features in one modality to a statistical model of the features in another modality.…”
Section: Related Workmentioning
confidence: 99%
“…Their robot uses a combination of clothes, height, and face recognition to identify enrolled individuals and follow them through an environment filled with unknown people. Other, more preliminary work, by Roy and Marcel [9], explores the reconstruction of missing audio/video recognition models from different perceptual modalities. For example, if a speaker is known only by voice, they could be recognized from lip movements in a video.…”
Section: Related Workmentioning
confidence: 99%
“…retrieving an image for a given text, and vice-versa. In biometrics, this is referred to as cross-modal recognition [25], [26]. Several solutions developed in multi-view learning lend themselves naturally to the task of cross-modal retrieval.…”
Section: B Related Work Elsewherementioning
confidence: 99%