2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015
DOI: 10.1109/cvpr.2015.7298636
|View full text |Cite
|
Sign up to set email alerts
|

Modeling local and global deformations in Deep Learning: Epitomic convolution, Multiple Instance Learning, and sliding window detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
147
0
2

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
3
3

Relationship

3
7

Authors

Journals

citations
Cited by 198 publications
(149 citation statements)
references
References 19 publications
0
147
0
2
Order By: Relevance
“…This is achieved by stochastic gradient descent training using the network architecture and cost function described in Section 3, which explicitly searches for the best candidate position of the object in the image using the global max-pooling operation. We also search over object scales (similar to [40]) by training from images of different sizes. The training procedure is illustrated in Figure 2.…”
Section: Weakly Supervised Learning and Classificationmentioning
confidence: 99%
“…This is achieved by stochastic gradient descent training using the network architecture and cost function described in Section 3, which explicitly searches for the best candidate position of the object in the image using the global max-pooling operation. We also search over object scales (similar to [40]) by training from images of different sizes. The training procedure is illustrated in Figure 2.…”
Section: Weakly Supervised Learning and Classificationmentioning
confidence: 99%
“…We first filter left image and reconstruction error and left disparity and geometric error map E g independently by using one layer of convolution followed by batch normalization. Both these results are then concatenated and followed by atrous convolution [18] to sample from a larger context without increasing the network size. We used dilations with rate 1, 2, 4, 8, 1, and 1 respectively.…”
Section: Disparity Refinementmentioning
confidence: 99%
“…We appropriately place interpolation layers to ensure that results from different skip layers have commensurate dimensions, while, as in [40,10], we use atrous convolution to increase the spatial resolution of high-level neurons. Finally, to account for the varying face sizes in images we employ a 3-scale pyramid of our proposed network where at scales 2 & 3 we down-sample the image by half and a quarter times respectively by using a 2D average pooling operation, similar to [33].…”
Section: Modelmentioning
confidence: 99%