This paper considers generalized linear models in a data-rich environment in which a large number of potentially useful explanatory variables are available. In particular, it deals with the case that the sample size and the number of explanatory variables are of similar sizes. We adopt the idea that the relevant information of explanatory variables concerning the dependent variable can be represented by a small number of common factors and investigate the issue of selecting the number of common factors while taking into account the effect of estimated regressors. We develop an information criterion under model mis-specification for both the distributional and structural assumptions and show that the proposed criterion is a natural extension of the Akaike information criterion (AIC). Simulations and empirical data analysis demonstrate that the proposed new criterion outperforms the AIC and Bayesian information criterion. as the LASSO technique for linear regression model may be used to overcome this difficulty. See Tibishirani [2]. In this paper, we focus on GLM models and take another approach to achieve dimension reduction. Specifically, we adopt the idea of principal component regression and assume that a small number of common factors of the explanatory variables are sufficient to describe the relevant information concerning the dependent variable. The common factors are latent variables and must be constructed from the observable explanatory variables. In this way, our approach is related to factor models.Factor models have a long history in statistical analysis. In recent years, they have attracted substantial attention as a useful tool to dimension reduction and/or statistical forecasting in the data-rich environment. See, for instance, . The use of factor models in GLM analysis, however, is less studied. The goal of our study is to close this gap.In time series analysis, when the number of explanatory series N and the sample size T increase at a similar rate, dynamic factor models and diffusion-index models have been proposed to improve the accuracy of prediction. See, for instance, Stock and Watson [23]. Applications of such models often involve a two-step procedure. In the first step, one uses principal component analysis or its variant to transform the explanatory observations into common factors. Certain criteria are then used to select the number of common factors, e.g. the scree plot. In the second step, one applies the least-squares method to fit a linear model for the dependent variable using the selected principal components along with some pre-determined variables as regressors. The pre-determined variables may include lagged values of the dependent variable. The fitted model is then used to make statistical prediction. This approach is plausible and, indeed, can be justified asymptotically under certain regularity conditions. See, for instance, Stock and Watson [22]. However, the selection of the number of common factors, or more precisely the choice of common factors, deserves a careful investi...