We aggregated coding variant data for 81,412 type 2 diabetes cases and
370,832 controls of diverse ancestry, identifying 40 coding variant association
signals (p<2.2×10−7): of these,
16 map outside known risk loci. We make two important observations. First, only
five of these signals are driven by low-frequency variants: even for these,
effect sizes are modest (odds ratio ≤1.29). Second, when we used
large-scale genome-wide association data to fine-map the associated variants in
their regional context, accounting for the global enrichment of complex trait
associations in coding sequence, compelling evidence for coding variant
causality was obtained for only 16 signals. At 13 others, the associated coding
variants clearly represent “false leads” with potential to
generate erroneous mechanistic inference. Coding variant associations offer a
direct route to biological insight for complex diseases and identification of
validated therapeutic targets: however, appropriate mechanistic inference
requires careful specification of their causal contribution to disease
predisposition.
ObjectivesUK Biobank is a UK-wide cohort of 502,655 people aged 40–69, recruited from National Health Service registrants between 2006–10, with healthcare data linkage. Type 2 diabetes is a key exposure and outcome. We developed algorithms to define prevalent and incident diabetes for UK Biobank. The algorithms will be implemented by UK Biobank and their results made available to researchers on request.MethodsWe used UK Biobank self-reported medical history and medication to assign prevalent diabetes and type, and tested this against linked primary and secondary care data in Welsh UK Biobank participants. Additionally, we derived and tested algorithms for incident diabetes using linked primary and secondary care data in the English Clinical Practice Research Datalink, and ran these on secondary care data in UK Biobank.Results and SignificanceFor prevalent diabetes, 0.001% and 0.002% of people classified as “diabetes unlikely” in UK Biobank had evidence of diabetes in their primary or secondary care record respectively. Of those classified as “probable” type 2 diabetes, 75% and 96% had specific type 2 diabetes codes in their primary and secondary care records. For incidence, 95% of people with the type 2 diabetes-specific C10F Read code in primary care had corroborative evidence of diabetes from medications, blood testing or diabetes specific process of care codes. Only 41% of people identified with type 2 diabetes in primary care had secondary care evidence of type 2 diabetes. In contrast, of incident cases using ICD-10 type 2 diabetes specific codes in secondary care, 77% had corroborative evidence of diabetes in primary care. We suggest our definition of prevalent diabetes from UK Biobank baseline data has external validity, and recommend that specific primary care Read codes should be used for incident diabetes to ensure precision. Secondary care data should be used for incident diabetes with caution, as around half of all cases are missed, and a quarter have no corroborative evidence of diabetes in primary care.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.