Summary.
We consider a situation where there is rich historical data available for the coefficients and their standard errors in an established regression model describing the association between a binary outcome variable Y and a set of predicting factors X, from a large study. We would like to utilize this summary information for improving estimation and prediction in an expanded model of interest, Y|X, B. The additional variable B is a new biomarker, measured on a small number of subjects in a new dataset. We develop and evaluate several approaches for translating the external information into constraints on regression coefficients in a logistic regression model of Y|X, B. Borrowing from the measurement error literature we establish an approximate relationship between the regression coefficients in the models Pr(Y = 1|X, β), Pr(Y = 1|X, B, γ) and E(B|X, θ) for a Gaussian distribution of B. For binary B we propose an alternate expression. The simulation results comparing these methods indicate that historical information on Pr(Y = 1|X, β) can improve the efficiency of estimation and enhance the predictive power in the regression model of interest Pr(Y = 1|X, B, γ). We illustrate our methodology by enhancing the High-grade Prostate Cancer Prevention Trial Risk Calculator, with two new biomarkers prostate cancer antigen 3 and TMPRSS2:ERG.