Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. The recent application of GWAS to clinic-based cohorts has also yielded genetic predictors of clinical outcomes. Regardless of context, the practical utility of this information will
NIH Public Access
Author ManuscriptCurr Protoc Hum Genet. Author manuscript; available in PMC 2012 January 1. ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. With each new dataset, new realities are discovered about GWAS data and best practices continue to be developed. The Genomics Workgroup of the National Human Genome Research Institute (NHGRI) funded electronic Medical Records and Genomics (eMERGE) network has invested considerable effort in developing strategies for QC of these data. The lessons learned by this group will be valuable for other investigators dealing with large scale genomic datasets. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the eMERGE network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. In this protocol we discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.
Polycystic ovary syndrome (PCOS) is a common, highly heritable complex disorder of unknown aetiology characterized by hyperandrogenism, chronic anovulation and defects in glucose homeostasis. Increased luteinizing hormone relative to follicle-stimulating hormone secretion, insulin resistance and developmental exposure to androgens are hypothesized to play a causal role in PCOS. Here we map common genetic susceptibility loci in European ancestry women for the National Institutes of Health PCOS phenotype, which confers the highest risk for metabolic morbidities, as well as reproductive hormone levels. Three loci reach genome-wide significance in the case–control meta-analysis, two novel loci mapping to chr 8p32.1 and chr 11p14.1, and a chr 9q22.32 locus previously found in Chinese PCOS. The same chr 11p14.1 SNP, rs11031006, in the region of the follicle-stimulating hormone B polypeptide (FSHB) gene strongly associates with PCOS diagnosis and luteinizing hormone levels. These findings implicate neuroendocrine changes in disease pathogenesis.
An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.
Type 2 diabetes (T2D) is more prevalent in African Americans than in Europeans. However, little is known about the genetic risk in African Americans despite the recent identification of more than 70 T2D loci primarily by genome-wide association studies (GWAS) in individuals of European ancestry. In order to investigate the genetic architecture of T2D in African Americans, the MEta-analysis of type 2 DIabetes in African Americans (MEDIA) Consortium examined 17 GWAS on T2D comprising 8,284 cases and 15,543 controls in African Americans in stage 1 analysis. Single nucleotide polymorphisms (SNPs) association analysis was conducted in each study under the additive model after adjustment for age, sex, study site, and principal components. Meta-analysis of approximately 2.6 million genotyped and imputed SNPs in all studies was conducted using an inverse variance-weighted fixed effect model. Replications were performed to follow up 21 loci in up to 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry. We identified three known loci (TCF7L2, HMGA2 and KCNQ1) and two novel loci (HLA-B and INS-IGF2) at genome-wide significance (4.15×10−94
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.