2003
DOI: 10.1007/3-540-45113-7_48
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
71
0
2

Year Published

2003
2003
2021
2021

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 63 publications
(73 citation statements)
references
References 9 publications
0
71
0
2
Order By: Relevance
“…Then the window is classified as a human, and the HOG features [16] will be extracted and tested with the PL-SVM of the next stage to finally decide whether it is a human being or not.…”
Section: G Two Stage Pl-svm Classificationsmentioning
confidence: 99%
“…Then the window is classified as a human, and the HOG features [16] will be extracted and tested with the PL-SVM of the next stage to finally decide whether it is a human being or not.…”
Section: G Two Stage Pl-svm Classificationsmentioning
confidence: 99%
“…We use the 22 clips from the groups set in which two speakers take turns reading digit strings and then proceed to speak simultaneously. In order to compare to [4] and [13] we only consider the section of alternating speech. In each clip both individuals face the camera at all times.…”
Section: Audio Visual Experimentsmentioning
confidence: 99%
“…To the best of our knowledge these results are equivalent to or better than all other reported results for speaker labeling on the CUAVE group set. Nock and Iyengar [4] obtain 75% accuracy with a windowed Gaussian MI measure and Gurban and Thiran [13] get 87.4% with a trained audio-visual speech detector. Both methods use a silence/speech detector and only perform a dependence test when there is speech.…”
Section: Audio Visual Experimentsmentioning
confidence: 99%
See 2 more Smart Citations