Systems genetics relies on common genetic variants to elucidate biologic networks contributing to complex disease-related phenotypes. Mice are ideal model organisms for such approaches, but linkage analysis has been only modestly successful due to low mapping resolution. Association analysis in mice has the potential of much better resolution, but it is confounded by population structure and inadequate power to map traits that explain less than 10% of the variance, typical of mouse quantitative trait loci (QTL). We report a novel strategy for association mapping that combines classic inbred strains for mapping resolution and recombinant inbred strains for mapping power. Using a mixed model algorithm to correct for population structure, we validate the approach by mapping over 2500 cis-expression QTL with a resolution an order of magnitude narrower than traditional QTL analysis. We also report the fine mapping of metabolic traits such as plasma lipids. This resource, termed the Hybrid Mouse Diversity Panel, makes possible the integration of multiple data sets and should prove useful for systems-based approaches to complex traits and studies of gene-by-environment interactions.
SUMMARY Obesity is a highly heritable disease driven by complex interactions between genetic and environmental factors. Human genome-wide association studies (GWAS) have identified a number of loci contributing to obesity; however, a major limitation of these studies is the inability to assess environmental interactions common to obesity. Using a systems genetics approach, we measured obesity traits, global gene expression, and gut microbiota composition in response to a high-fat/high-sucrose (HF/HS) diet of more than 100 inbred strains of mice. Here we show that HF/HS feeding promotes robust, strain-specific changes in obesity that is not accounted for by food intake and provide evidence for a genetically determined set-point for obesity. GWAS analysis identified 11 genome-wide significant loci associated with obesity traits, several of which overlap with loci identified in human studies. We also show strong relationships between genotype and gut microbiota plasticity during HF/HS feeding and identify gut microbial phylotypes associated with obesity.
Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/.A LTHOUGH genome-wide association studies (GWAS) reproducibly identified thousands of risk loci (Hakonarson et al. 2007;Sladek et al. 2007;Zeggini et al. 2007; Yang et al. 2011a,b;Kottgen et al. 2013;Lu et al. 2013;Ripke et al. 2013), only a handful of causal genetic variants (i.e., variants that biologically alter disease risk) have been found (Altshuler et al. 2008;Manolio et al. 2008;McCarthy et al. 2008), thus prohibiting the mechanistic understanding of the genetic basis of common diseases. The linkage disequilibrium (LD) (Pritchard and Przeworski 2001;Reich et al. 2001) structure of the human genome has greatly benefited GWAS in interrogating only a subset of all variants to assay common variation across the genome. Unfortunately, LD hinders the identification of causal variants at risk loci in fine-mapping studies as at each locus, there are often tens to hundreds of variants tightly linked to the reported associated single-nucleotide polymorphism (SNP) (Malo et al. 2008;Maller et al. 2012;Yang et al. 2012). In a continued effort to identify causal variants, many finemapping studies that assess genetic variation at known GWAS risk loci are currently underway (Bauer et al. 2013;Coram et al. 2013;Diogo et al. 2013;Gong et al. 2013;Marigorta and Navarro 2013;Peters et al. 2013;Wu et al. 2013).Fine-mapping studies typically follow a two-step procedure. First, a statistical analysis of the association signal is performed to identify a minimum set of SNPs that can explain the signal. Second, the SNPs that ar...
Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/.A LTHOUGH genome-wide association studies (GWAS) reproducibly identified thousands of risk loci (Hakonarson et al. 2007;Sladek et al. 2007;Zeggini et al. 2007; Yang et al. 2011a,b;Kottgen et al. 2013;Lu et al. 2013;Ripke et al. 2013), only a handful of causal genetic variants (i.e., variants that biologically alter disease risk) have been found (Altshuler et al. 2008;Manolio et al. 2008;McCarthy et al. 2008), thus prohibiting the mechanistic understanding of the genetic basis of common diseases. The linkage disequilibrium (LD) (Pritchard and Przeworski 2001;Reich et al. 2001) structure of the human genome has greatly benefited GWAS in interrogating only a subset of all variants to assay common variation across the genome. Unfortunately, LD hinders the identification of causal variants at risk loci in fine-mapping studies as at each locus, there are often tens to hundreds of variants tightly linked to the reported associated single-nucleotide polymorphism (SNP) (Malo et al. 2008;Maller et al. 2012;Yang et al. 2012). In a continued effort to identify causal variants, many finemapping studies that assess genetic variation at known GWAS risk loci are currently underway (Bauer et al. 2013;Coram et al. 2013;Diogo et al. 2013;Gong et al. 2013;Marigorta and Navarro 2013;Peters et al. 2013;Wu et al. 2013).Fine-mapping studies typically follow a two-step procedure. First, a statistical analysis of the association signal is performed to identify a minimum set of SNPs that can explain the signal. Second, the SNPs that ar...
Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.