Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (one every 17 bases) and geographically localized, such that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. Overall we conclude that, due to rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
There have been increasing efforts to relate drug efficacy and disease predisposition with genetic polymorphisms. We present statistical tests for association of haplotype frequencies with discrete and continuous traits in samples of unrelated individuals. Haplotype frequencies are estimated through the expectation-maximization algorithm, and each individual in the sample is expanded into all possible haplotype configurations with corresponding probabilities, conditional on their genotype. A regression-based approach is then used to relate inferred haplotype probabilities to the response. The relationship of this technique to commonly used approaches developed for case-control data is discussed. We confirm the proper size of the test under H₀ and find an increase in power under the alternative by comparing test results using inferred haplotypes with single-marker tests using simulated data. More importantly, analysis of real data comprised of a dense map of single nucleotide polymorphisms spaced along a 12-cM chromosomal region allows us to confirm the utility of the haplotype approach as well as the validity and usefulness of the proposed statistical technique. The method appears to be successful in relating data from multiple, correlated markers to response.
Genotyping of classical HLA alleles is an essential tool in the analysis of diseases and adverse drug reactions with associations mapping to the major histocompatibility complex (MHC). However, deriving high-resolution HLA types subsequent to whole-genome SNP typing or sequencing is often cost prohibitive for large samples. An alternative approach takes advantage of the extended haplotype structure within the MHC to predict HLA alleles using dense SNP genotypes, such as those available from genome-wide SNP panels. Current methods for HLA imputation are difficult to apply or may require the user to have access to large training data sets with SNP and HLA types. We propose HIBAG, HLA Imputation using attribute BAGging, that makes predictions by averaging HLA type posterior probabilities over an ensemble of classifiers built on bootstrap samples. We assess the performance of HIBAG using our study data (n = 2, 668 subjects of European ancestry) as a training set and HLA data from the British 1958 birth cohort study (n ≈ 1, 000 subjects) as independent validation samples. Prediction accuracies for HLA–A, B, C, DRB1 and DQB1 range from 92.2% to 98.1% using a set of SNP markers common to the Illumina 1M Duo, OmniQuad, OmniExpress, 660K and 550K platforms. HIBAG performed well compared to the other two leading methods HLA*IMP and BEAGLE. This method is implemented in a freely-available HIBAG R package that includes pre-fit classifiers for European, Asian, Hispanic and African ancestries, providing a readily available imputation approach without the need to have access to large training datasets.
Genetic factors influence the development of type II diabetes mellitus, but genetic loci for the most common forms of diabetes have not been identified. A genomic scan was conducted to identify loci linked to diabetes and body-mass index (BMI) in Pima Indians, a Native American population with a high prevalence of type II diabetes. Among 264 nuclear families containing 966 siblings, 516 autosomal markers with a median distance between adjacent markers of 6.4 cM were genotyped. Variance-components methods were used to test for linkage with an age-adjusted diabetes score and with BMI. In multipoint analyses, the strongest evidence for linkage with age-adjusted diabetes (LOD = 1.7) was on chromosome 11q, in the region that was also linked most strongly with BMI (LOD = 3.6). Bivariate linkage analyses strongly rejected both the null hypothesis of no linkage with either trait and the null hypothesis of no contribution of the locus to the covariation among the two traits. Sib-pair analyses suggest additional potential diabetes-susceptibility loci on chromosomes 1q and 7q.
To identify single-nucleotide polymorphisms (SNPs) associated with risk and age at onset of Alzheimer disease (AD) in a genomewide association study of 469 438 SNPs.
We review and extend a recent suggestion that fine-scale localization of a disease-susceptibility locus for a complex disease be done on the basis of deviations from Hardy-Weinberg equilibrium among affected individuals. This deviation is driven by linkage disequilibrium between disease and marker loci in the whole population and requires a heterogeneous genetic basis for the disease. A finding of marker-locus Hardy-Weinberg disequilibrium therefore implies disease heterogeneity and marker-disease linkage disequilibrium. Although a lack of departure of Hardy-Weinberg disequilibrium at marker loci implies that disease susceptibilityweighted linkage disequilibria are zero, given disease heterogeneity, it does not follow that the usual measures of linkage disequilibrium are zero. For disease-susceptibility loci with more than two alleles, therefore, care is needed in the drawing of inferences from marker Hardy-Weinberg disequilibria.
Technological and scientific advances, stemming in large part from the Human Genome and HapMap projects, have made large-scale, genome-wide investigations feasible and cost effective. These advances have the potential to dramatically impact drug discovery and development by identifying genetic factors that contribute to variation in disease risk as well as drug pharmacokinetics, treatment efficacy, and adverse drug reactions. In spite of the technological advancements, successful application in biomedical research would be limited without access to suitable sample collections. To facilitate exploratory genetics research, we have assembled a DNA resource from a large number of subjects participating in multiple studies throughout the world. This growing resource was initially genotyped with a commercially available genome-wide 500,000 single-nucleotide polymorphism panel. This project includes nearly 6,000 subjects of African-American, East Asian, South Asian, Mexican, and European origin. Seven informative axes of variation identified via principal-component analysis (PCA) of these data confirm the overall integrity of the data and highlight important features of the genetic structure of diverse populations. The potential value of such extensively genotyped collections is illustrated by selection of genetically matched population controls in a genome-wide analysis of abacavir-associated hypersensitivity reaction. We find that matching based on country of origin, identity-by-state distance, and multidimensional PCA do similarly well to control the type I error rate. The genotype and demographic data from this reference sample are freely available through the NCBI database of Genotypes and Phenotypes (dbGaP).
Pathogenic mutations in APP, PSEN1, PSEN2, MAPT and GRN have previously been linked to familial early onset forms of dementia. Mutation screening in these genes has been performed in either very small series or in single families with late onset AD (LOAD). Similarly, studies in single families have reported mutations in MAPT and GRN associated with clinical AD but no systematic screen of a large dataset has been performed to determine how frequently this occurs. We report sequence data for 439 probands from late-onset AD families with a history of four or more affected individuals. Sixty sequenced individuals (13.7%) carried a novel or pathogenic mutation. Eight pathogenic variants, (one each in APP and MAPT, two in PSEN1 and four in GRN) three of which are novel, were found in 14 samples. Thirteen additional variants, present in 23 families, did not segregate with disease, but the frequency of these variants is higher in AD cases than controls, indicating that these variants may also modify risk for disease. The frequency of rare variants in these genes in this series is significantly higher than in the 1,000 genome project (p = 5.09×10−5; OR = 2.21; 95%CI = 1.49–3.28) or an unselected population of 12,481 samples (p = 6.82×10−5; OR = 2.19; 95%CI = 1.347–3.26). Rare coding variants in APP, PSEN1 and PSEN2, increase risk for or cause late onset AD. The presence of variants in these genes in LOAD and early-onset AD demonstrates that factors other than the mutation can impact the age at onset and penetrance of at least some variants associated with AD. MAPT and GRN mutations can be found in clinical series of AD most likely due to misdiagnosis. This study clearly demonstrates that rare variants in these genes could explain an important proportion of genetic heritability of AD, which is not detected by GWAS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.