2010
DOI: 10.1007/978-3-642-15825-4_10
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition

Abstract: Abstract.A common practice to gain invariant features in object recognition models is to aggregate multiple low-level features over a small neighborhood. However, the differences between those models makes a comparison of the properties of different aggregation functions hard. Our aim is to gain insight into different functions by directly comparing them on a fixed architecture for several common object recognition tasks. Empirical results show that a maximum pooling operation significantly outperforms subsamp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
646
0
8

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 1,079 publications
(658 citation statements)
references
References 19 publications
1
646
0
8
Order By: Relevance
“…5.5, 5.16) (Ranzato et al, 2007). Advantages of doing this were pointed out subsequently (Scherer et al, 2010). BP-trained MPCNNs have become central to many modern, competition-winning, feedforward, visual Deep Learners (Sec.…”
Section: : Ul-based History Compression Through a Deep Stack Of Rnnsmentioning
confidence: 99%
“…5.5, 5.16) (Ranzato et al, 2007). Advantages of doing this were pointed out subsequently (Scherer et al, 2010). BP-trained MPCNNs have become central to many modern, competition-winning, feedforward, visual Deep Learners (Sec.…”
Section: : Ul-based History Compression Through a Deep Stack Of Rnnsmentioning
confidence: 99%
“…The biggest architectural difference between our DNN and the CNN of LeCun et al (1998) is the use of max-pooling layers (Riesenhuber & Poggio, 1999;Serre et al, 2005;Scherer et al, 2010) instead of sub-sampling layers. The output of a max-pooling layer is given by the maximum activation over non-overlapping rectangular regions of size K x × K y .…”
Section: Max-pooling Layermentioning
confidence: 99%
“…These findings in conjunction with experimental studies of the visual cortex justify the use of such filters in the so-called standard model for object recognition (Riesenhuber & Poggio, 1999;Serre et al, 2005;Mutch & Lowe, 2008), whose filters are fixed, in contrast to those of Convolutional Neural Networks (CNNs) (LeCun et al, 1998;Behnke, 2003;Simard et al, 2003), whose weights (filters) are randomly initialized and learned in a supervised way using back-propagation (BP). A DNN, the basic building block of our proposed MCDNN, is a hierarchical deep neural network, alternating convolutional with max-pooling layers (Riesenhuber & Poggio, 1999;Serre et al, 2005;Scherer et al, 2010). A single DNN of our team won the offline Chinese character recognition competition (Liu et al, 2011), a classification problem with 3755 classes.…”
Section: Introductionmentioning
confidence: 99%
“…Recent advances leading to success in the ImageNet challenge stem from dropout to prevent overfitting [3], rectifying linear units for improved convergence, backpropagation through max-pooling [4] and GPU implementations for speed, all of which are also used in this work.…”
Section: Related Workmentioning
confidence: 99%