This paper devises a new means of filter diversification, dubbed multi-fold filter convolution ( -FFC), for face recognition. On the assumption that -FFC receives singlescale Gabor filters of varying orientations as input, these filters are self-cross convolved by -fold to instantiate a filter offspring set. The -FFC flexibility also permits cross convolution amongst Gabor filters and other filter banks of profoundly dissimilar traits, e.g., principal component analysis (PCA) filters, and independent component analysis (ICA) filters. The 2-FFC of Gabor, PCA and ICA filters thus yields three offspring sets: (1) Gabor filters solely, (2) Gabor-PCA filters, and (3) Gabor-ICA filters, to render the learning-free and the learning-based 2-FFC descriptors. To facilitate a sensible Gabor filter selection for -FFC, the 40 multiscale, multi-orientation Gabor filters are condensed into 8 elementary filters. Aside from that, an average histogram pooling operator is employed to leverage the -FFC histogram features, prior to the final whitening PCA compression. The empirical results substantiate that the 2-FFC descriptors prevail over, or on par with, other face descriptors on both identification and verification tasks.
Index Terms-Gaborfilters, PCA filters, ICA filters, filter convolution, face recognition Hong Kong learns from approximately 300,000 images with 13,000 identities; FaceNet [5] by Google trains CNNs from 200M images spanning over 8M identities. These prevailing CNN models, particularly DeepID3 and FaceNet, reportedly achieve accuracies of 99.53% and 99.63%, respectively, on the labeled faces in the wild (LFW) dataset [41], surpassing the human-level performance of 97.53%. On the contrary, the FB approaches, e.g., PCANet [14], discriminant face descriptor (DFD) [15], compact binary face descriptor (CBFD) [16], binarized statistical image features (BSIF) [17-18], DCTNet [20], etc., are typically equipped with a single or two filtering layers. Despite of being simple and easy of use, these CNN simplifications promise the state of the art robustness to the generic image classification problems including face.The earliest FB approaches are reviewed and compared in [6]. They share a common three-stage pipeline, referred to as filter-rectify-filter (FRF): (1) a convolutional stage based on the heuristically designed filter banks, e.g., Laws masks, ring and wedge filters, Gabor filters, wavelet transform, packets and frames, discrete cosine transform (DCT), etc.; or other optimal filters, e.g., principal component analysis (PCA) eigenfilters, Karhunen-Loeve transform, prediction error filters, optimized Gabor filters, etc., (2) a nonlinearity, a. k. a filter response rectification step, e.g., magnitude, squaring, rectified sigmoid, etc., (3) pooling (filtering) operations, e.g., spatial averaging, smoothing, or nonlinear inhibition, to remove the inhomogeneity in the rectified responses within a homogenous region. The local energy function, includes stage (2) and (3), outputs a set of feature images, one per filter, def...