2012
DOI: 10.1007/978-3-642-27355-1_7
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Video Concept Detection via Bag of Auditory Words and Multiple Kernel Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
1
3

Year Published

2013
2013
2019
2019

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(13 citation statements)
references
References 12 publications
0
9
1
3
Order By: Relevance
“…Finally, the results show that CAT and MKL react similarly when reducing the size of the training set. This is in disagreement with the literature (see [3]). Notice, however, that the experimental conditions are not the same.…”
Section: Resultscontrasting
confidence: 97%
See 3 more Smart Citations
“…Finally, the results show that CAT and MKL react similarly when reducing the size of the training set. This is in disagreement with the literature (see [3]). Notice, however, that the experimental conditions are not the same.…”
Section: Resultscontrasting
confidence: 97%
“…It is also worth to notice that the MKL and the CAT methods are comparable and both perform better than the CWS approach. This last statement is in disagreement with [3], where MKL outperforms CAT. Albeit, the experimental conditions are not the same.…”
Section: Resultsmentioning
confidence: 79%
See 2 more Smart Citations
“…In early fusion the audio and the visual features are combined before classification [12] while in late fusion the classification scores from the individual feature models are combined [9,11,16]. The kernel fusion can be considered as an intermediate fusion, the audio and the visual features at the kernel level are merge before performing the classification [1,17]. These methods fuse the audio and visual modalities without considering their correlations.…”
Section: Related Workmentioning
confidence: 99%