Summary
Complex interplay between genetic and environmental factors characterizes
the etiology of many diseases. Modeling gene-environment (GxE) interactions is
often challenged by the unknown functional form of the environment term in the
true data-generating mechanism. We study the impact of misspecification of the
environmental exposure effect on inference for the GxE interaction term in
linear and logistic regression models. We first examine the asymptotic bias of
the GxE interaction regression coefficient, allowing for confounders as well as
arbitrary misspecification of the exposure and confounder effects. For linear
regression, we show that under gene-environment independence and some
confounder-dependent conditions, when the environment effect is misspecified,
the regression coefficient of the GxE interaction can be unbiased. However,
inference on the GxE interaction is still often incorrect. In logistic
regression, we show that the regression coefficient is generally biased if the
genetic factor is associated with the outcome directly or indirectly. Further we
show that the standard robust sandwich variance estimator for the GxE
interaction does not perform well in practical GxE studies, and we provide an
alternative testing procedure that has better finite sample properties.