2013
DOI: 10.1007/s11263-013-0662-8
|View full text |Cite
|
Sign up to set email alerts
|

Learning Discriminative Space–Time Action Parts from Weakly Labelled Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 54 publications
(23 citation statements)
references
References 40 publications
0
23
0
Order By: Relevance
“…It can be seen that the two-layer model had better performance than the low-level encoding using different descriptors. In addition, we compare the proposed method with a few classic methods on KTH, J-HMDB, and YouTube datasets, such as DT + BoVW [1], mid-level parts [21], traditional FV [17], stacked FV [17], DT + BOW [10], and IDT + FV [17]. As shown in Table 2, the twolayer model obtains 67.4% and 87.6% accuracy on J-HMDB and YouTube datasets, respectively.…”
Section: Experiments Resultmentioning
confidence: 99%
See 1 more Smart Citation
“…It can be seen that the two-layer model had better performance than the low-level encoding using different descriptors. In addition, we compare the proposed method with a few classic methods on KTH, J-HMDB, and YouTube datasets, such as DT + BoVW [1], mid-level parts [21], traditional FV [17], stacked FV [17], DT + BOW [10], and IDT + FV [17]. As shown in Table 2, the twolayer model obtains 67.4% and 87.6% accuracy on J-HMDB and YouTube datasets, respectively.…”
Section: Experiments Resultmentioning
confidence: 99%
“…YouTube J-HMDB ISA [15] 86.5% Liu et al [16] 71.2% Traditional FV [17] 62.83% Yeffet and Wolf [18] 90.1% Ikizler-Cinbis and Sclaroff [19] 75.21% Stacked FV [17] 59.27% Cheng et al [20] 89.7% DT + BoVW [1] 85.4% DT + BOW [10] 56.6% Le et al [15] 93.9% Mid-level parts [21] 84.5% IDT + FV [17] 62.8% Two-layer model 92.6% Two-layer model 87.6% Two-layer model 67.4%…”
Section: Kthmentioning
confidence: 99%
“…We show the classification accuracy in Table 3. [3] is the Table 3 Comparison of our proposed S-T Saliency to the state-of-the-art on both datasets Algorithms UCF11 HMDB51 HoG [3] 74.5 40.2 HoF [3] 72.8 48.9 MBH [3] 83.9 52.1 DT [3] 84.2 54.7 iDT [18], [19] 90.7 57.2 Mid-level parts [20] 84.5 37.2 CompactFV [5] 89 54.8 Our best 90.6 60. 1 original dense trajectory, [18] is the improved dense trajectories which compensated the camera motion by using RANSAC [21] to compute homography and remove the camera motion in each frame, they also use fisher vector [22] for feature encoding, Oneata et.al.…”
Section: Comparison With State-of-the-artmentioning
confidence: 99%
“…[5] extract spatial fisher vectors based on MBH and SIFT descriptors. [20] uses a deformable model to learn space-time parts.…”
Section: Comparison With State-of-the-artmentioning
confidence: 99%
“…Besides other domains such as scene, object recognition and tracking [26,5], MIL has been used in the categorization of singleton human actions in [2,22,43]. Prabhakar and Rehg [36] use multiple instance learning to infer the labels of causal sets which temporally co-occur in turn-taking interactions.…”
Section: Related Workmentioning
confidence: 99%