Background: Genome-wide association studies and other computational biology techniques are gradually discovering the causal gene variants that contribute to late-onset human diseases. After more than a decade of genome-wide association study efforts, these can account for only a fraction of the heritability implied by familial studies, the so-called "missing heritability" problem.
Methods:Computer simulations of polygenic late-onset diseases in an aging population have quantified the risk allele frequency decrease at older ages caused by individuals with higher polygenic risk scores becoming ill proportionately earlier. This effect is most prominent for diseases characterized by high cumulative incidence and high heritability, examples of which include Alzheimer's disease, coronary artery disease, cerebral stroke, and type 2 diabetes.
Results:The incidence rate for late-onset diseases grows exponentially for decades after early onset ages, guaranteeing that the cohorts used for genomewide association studies overrepresent older individuals with lower polygenic risk scores, whose disease cases are disproportionately due to environmental causes such as old age itself. This mechanism explains the decline in clinical predictive power with age and the lower discovery power of familial studies of heritability and genome-wide association studies. It also explains the relatively constant-with-age heritability found for late-onset diseases of lower prevalence, exemplified by cancers.
Conclusions:For late-onset polygenic diseases showing high cumulative incidence together with high initial heritability, rather than using relatively old agematched cohorts, study cohorts combining the youngest possible cases with the oldest possible controls may significantly improve the discovery power of genomewide association studies.Algorithm 1: Sampling individuals diagnosed with a disease proportionately to their polygenic odds ratio and incidence rate. for age = 1 to M axAge do number I ll T hisYear = I(age)· N // N is unaffected population for i = 1 to number I ll T hisYear do HRsum = 0 // will recalculate sum of all HRs for u = 1 to N do HRsum = HRsum + ORt oHR(G u ) // calculate the HR total LOOKU P(add, HRsum, u) // add u th individual to the lookup table r and = RandomN umber(0, HRsum) // pick a random number ill = LOOKU P( f ind, r and, N ) // found newly diagnosed N = N − 1 // decrement in number of healthy individuals P r ocessAndAnal yze(ill) Note: an individual makes a sampling target proportionate to the hazard ratio (HR) in the LOOKU P() table.Odds ratios (ORs) are converted to HRs, similar to the approach taken by . An individual with an HR of 15 will be 150 times more likely to be sampled than an individual with an HR of 0.1. P r ocessAndAnal yze() moves newly diagnosed individual from the healthy to the ill population pool, accounts for allele distribution, case/control ORs, etc.Descriptively, the algorithm works as follows. In this prospective simulation, each next individual to be diagnosed with an LOD is chosen propo...