Principal component analysis (PCA) and its variations are still the primary tool for feature extraction (FE) in remote sensing community. This is unfortunate, as there has been strong argument against using PCA for this purpose due to its inherent linear properties and uninformative principal components. Therefore, several critical issues still should be considered in hyperspectral image classification when using PCA, among which: 1) the large number of spectral channels and small number of training samples; 2) the nonlinearities of hyperspectral data; 3) the small-sample issue. In order to alleviate these problems, this paper employs a new information-theoretic FE method so-called kernel entropic component analysis (KECA), which can not only extract more nonlinear information, but also be adapt to limited-sample environment. A theorem of the pivoted Cholesky decomposition is also introduced to improve the efficiency of the KECA. The optimized version can more rapidly implement spectral-spatial features extraction, particularly for large-scale HSIs, while effectively maintaining the clustering performances of KECA. Experiments implemented on several real HSIs verify the effectiveness of the new method armed with a support vector machine (SVM) classifier, in comparison with other PCA-based and state-of-the-art HSI classification algorithms. The code will be also made publicly available.