The LG/J x SM/J advanced intercross line of mice (LG x SM AIL) is a multigenerational outbred population. High minor allele frequencies, a simple genetic background, and the fully sequenced LG and SM genomes make it a powerful population for genome-wide association studies. Here we use 1,063 AIL mice to identify 126 significant associations for 50 traits relevant to human health and disease. We also identify thousands of cis- and trans-eQTLs in the hippocampus, striatum, and prefrontal cortex of ~200 mice. We replicate an association between locomotor activity and Csmd1, which we identified in an earlier generation of this AIL, and show that Csmd1 mutant mice recapitulate the locomotor phenotype. Our results demonstrate the utility of the LG x SM AIL as a mapping population, identify numerous novel associations, and shed light on the genetic architecture of mammalian behavior.
Genome wide association analyses (GWAS) in model organisms have numerous advantages compared 2 to human GWAS, including the ability to use populations with well-defined genetic diversity, the ability to 3 collect tissue for gene expression analysis and the ability to perform experimental manipulations. We 4 examined behavioral, physiological, and gene expression traits in 1,063 male and female mice from a 5 50-generation intercross between two inbred strains (LG/J and SM/J). We used genotyping by 6 sequencing in conjunction with whole genome sequence data from the two founder strains to obtain 7 genotypes at 4.3 million SNPs. As expected, all alleles were common (mean MAF=0.35) and linkage 8 disequilibrium degraded rapidly, providing excellent power and sub-megabase mapping precision. We 9 identified 126 genome-wide significant loci for 50 traits and integrated this information with 7,081 cis-10 eQTLs and 1,476 trans-eQTLs identified in hippocampus, striatum and prefrontal cortex. We replicated 11 several loci that were identified using an earlier generation of this intercross, including an association 12 between locomotor activity and a locus containing a single gene, Csmd1. We also showed that Csmd1 13 mutant mice recapitulated the locomotor phenotype. Our results demonstrate the utility of this population, 14 identify numerous novel associations, and provide examples of replication in an independent cohort, 15 which is customary in human genetics, and replication by experimental manipulation, which is a unique 16 advantage of model organisms.
Many recent studies have emphasized the importance of genetic variants and mutations in cancer and other complex human diseases. The overwhelming majority of these variants occur in non-coding portions of the genome, where they can have a functional impact by disrupting regulatory interactions between transcription factors (TFs) and DNA. Here, we present a method for assessing the impact of non-coding mutations on TF-DNA interactions, based on regression models of DNA-binding specificity trained on high-throughput in vitro data. We use ordinary least squares (OLS) to estimate the parameters of the binding model for each TF, and we show that our predictions of TF-binding changes due to DNA mutations correlate well with measured changes in gene expression. In addition, by leveraging distributional results associated with OLS estimation, for each predicted change in TF binding we also compute a normalized score (z-score) and a significance value (p-value) reflecting our confidence that the mutation affects TF binding. We use this approach to analyze a large set of pathogenic non-coding variants, and we show that these variants lead to significant differences in TF binding between alleles, compared to a control set of common variants. Thus, our results indicate that there is a strong regulatory component to the pathogenic non-coding variants identified thus far.
High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure regulatory element activity across the entire human genome at once. The resulting data, however, present substantial analytical challenges. Here, we identify technical biases that explain most of the variance in STARR-seq data. We then develop a statistical model to correct those biases and to improve detection of regulatory elements. This approach substantially improves precision and recall over current methods, improves detection of both activating and repressive regulatory elements, and controls for false discoveries despite strong local correlations in signal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.