The automated and accurate classification of the images portraying the Human Epithelial cells of type 2 (HEp-2) represents one of the most important steps in the diagnosis procedure of many autoimmune diseases. The extreme intra-class variations of the HEp-2 cell images datasets drastically complicates the classification task. We propose in this work a classification framework that, unlike most of the state-of-the-art methods, uses a deep learning-based feature extraction method in a strictly unsupervised way. We propose a deep learning-based hybrid feature learning with two levels of deep convolutional autoencoders. The first level takes the original cell images as the inputs and learns to reconstruct them, in order to capture the features related to the global shape of the cells, and the second network takes the gradients of the images, in order to encode the localized changes in intensity (gray variations) that characterize each cell type. A final feature vector is constructed by combining the latent representations extracted from the two networks, giving a highly discriminative feature representation. The created features will be fed to a nonlinear classifier whose output will represent the type of the cell image. We have tested the discriminability of the proposed features on two of the most popular HEp-2 cell classification datasets, the SNPHEp-2 and ICPR 2016 datasets. The results show that the proposed features manage to capture the distinctive characteristics of the different cell types while performing at least as well as the actual deep learning-based state-of-the-art methods in terms of discrimination.