We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.
SummaryThe ADAR RNA-editing enzymes deaminate adenosine bases to inosines in cellular RNAs. Aberrant interferon expression occurs in patients in whom ADAR1 mutations cause Aicardi-Goutières syndrome (AGS) or dystonia arising from striatal neurodegeneration. Adar1 mutant mouse embryos show aberrant interferon induction and die by embryonic day E12.5. We demonstrate that Adar1 embryonic lethality is rescued to live birth in Adar1; Mavs double mutants in which the antiviral interferon induction response to cytoplasmic double-stranded RNA (dsRNA) is prevented. Aberrant immune responses in Adar1 mutant mouse embryo fibroblasts are dramatically reduced by restoring the expression of editing-active cytoplasmic ADARs. We propose that inosine in cellular RNA inhibits antiviral inflammatory and interferon responses by altering RLR interactions. Transfecting dsRNA oligonucleotides containing inosine-uracil base pairs into Adar1 mutant mouse embryo fibroblasts reduces the aberrant innate immune response. ADAR1 mutations causing AGS affect the activity of the interferon-inducible cytoplasmic isoform more severely than the nuclear isoform.
Structural variation is widespread in mammalian genomes and is an important cause of disease, but just how abundant and important structural variants (SVs) are in shaping phenotypic variation remains unclear. Without knowing how many SVs there are, and how they arise, it is difficult to discover what they do. Combining experimental with automated analyses, we identified 711,920 SVs at 281,243 sites in the genomes of thirteen classical and four wild-derived inbred mouse strains. The majority of SVs are less than 1 kilobase in size and 98% are deletions or insertions. The breakpoints of 160,000 SVs were mapped to base pair resolution, allowing us to infer that insertion of retrotransposons causes more than half of SVs. Yet, despite their prevalence, SVs are less likely than other sequence variants to cause gene expression or quantitative phenotypic variation. We identified 24 SVs that disrupt coding exons, acting as rare variants of large effect on gene function. One-third of the genes so affected have immunological functions.
BackgroundTransposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined.ResultsUsing whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected.ConclusionsMost TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.
Neural networks achieve the state-of-the-art in image classification tasks. However, they can encode spurious variations or biases that may be present in the training data. For example, training an age predictor on a dataset that is not balanced for gender can lead to gender biased predicitons (e.g. wrongly predicting that males are older if only elderly males are in the training set). We present two distinct contributions: 1) An algorithm that can remove multiple sources of variation from the feature representation of a network. We demonstrate that this algorithm can be used to remove biases from the feature representation, and thereby improve classification accuracies, when training networks on extremely biased datasets. 2) An ancestral origin database of 14,000 images of individuals from East Asia, the Indian subcontinent, sub-Saharan Africa, and Western Europe. We demonstrate on this dataset, for a number of facial attribute classification tasks, that we are able to remove racial biases from the network feature representation.
BackgroundAdenosine-to-inosine (A-to-I) editing is a site-selective post-transcriptional alteration of double-stranded RNA by ADAR deaminases that is crucial for homeostasis and development. Recently the Mouse Genomes Project generated genome sequences for 17 laboratory mouse strains and rich catalogues of variants. We also generated RNA-seq data from whole brain RNA from 15 of the sequenced strains.ResultsHere we present a computational approach that takes an initial set of transcriptome/genome mismatch sites and filters these calls taking into account systematic biases in alignment, single nucleotide variant calling, and sequencing depth to identify RNA editing sites with high accuracy. We applied this approach to our panel of mouse strain transcriptomes identifying 7,389 editing sites with an estimated false-discovery rate of between 2.9 and 10.5%. The overwhelming majority of these edits were of the A-to-I type, with less than 2.4% not of this class, and only three of these edits could not be explained as alignment artifacts. We validated 24 novel RNA editing sites in coding sequence, including two non-synonymous edits in the Cacna1d gene that fell into the IQ domain portion of the Cav1.2 voltage-gated calcium channel, indicating a potential role for editing in the generation of transcript diversity.ConclusionsWe show that despite over two million years of evolutionary divergence, the sites edited and the level of editing at each site is remarkably consistent across the 15 strains. In the Cds2 gene we find evidence for RNA editing acting to preserve the ancestral transcript sequence despite genomic sequence divergence.
BackgroundCornelia de Lange syndrome (CdLS) is a multisystem disorder with distinctive facial appearance, intellectual disability and growth failure as prominent features. Most individuals with typical CdLS have de novo heterozygous loss-of-function mutations in NIPBL with mosaic individuals representing a significant proportion. Mutations in other cohesin components, SMC1A, SMC3, HDAC8 and RAD21 cause less typical CdLS.MethodsWe screened 163 affected individuals for coding region mutations in the known genes, 90 for genomic rearrangements, 19 for deep intronic variants in NIPBL and 5 had whole-exome sequencing.ResultsPathogenic mutations [including mosaic changes] were identified in: NIPBL 46 [3] (28.2%); SMC1A 5 [1] (3.1%); SMC3 5 [1] (3.1%); HDAC8 6 [0] (3.6%) and RAD21 1 [0] (0.6%). One individual had a de novo 1.3 Mb deletion of 1p36.3. Another had a 520 kb duplication of 12q13.13 encompassing ESPL1, encoding separase, an enzyme that cleaves the cohesin ring. Three de novo mutations were identified in ANKRD11 demonstrating a phenotypic overlap with KBG syndrome. To estimate the number of undetected mosaic cases we used recursive partitioning to identify discriminating features in the NIPBL-positive subgroup. Filtering of the mutation-negative group on these features classified at least 18% as ‘NIPBL-like’. A computer composition of the average face of this NIPBL-like subgroup was also more typical in appearance than that of all others in the mutation-negative group supporting the existence of undetected mosaic cases.ConclusionsFuture diagnostic testing in ‘mutation-negative’ CdLS thus merits deeper sequencing of multiple DNA samples derived from different tissues.
BackgroundDe novo mutations in PURA have recently been described to cause PURA syndrome, a neurodevelopmental disorder characterised by severe intellectual disability (ID), epilepsy, feeding difficulties and neonatal hypotonia.ObjectivesTo delineate the clinical spectrum of PURA syndrome and study genotype-phenotype correlations.MethodsDiagnostic or research-based exome or Sanger sequencing was performed in individuals with ID. We systematically collected clinical and mutation data on newly ascertained PURA syndrome individuals, evaluated data of previously reported individuals and performed a computational analysis of photographs. We classified mutations based on predicted effect using 3D in silico models of crystal structures of Drosophila-derived Pur-alpha homologues. Finally, we explored genotype-phenotype correlations by analysis of both recurrent mutations as well as mutation classes.ResultsWe report mutations in PURA (purine-rich element binding protein A) in 32 individuals, the largest cohort described so far. Evaluation of clinical data, including 22 previously published cases, revealed that all have moderate to severe ID and neonatal-onset symptoms, including hypotonia (96%), respiratory problems (57%), feeding difficulties (77%), exaggerated startle response (44%), hypersomnolence (66%) and hypothermia (35%). Epilepsy (54%) and gastrointestinal (69%), ophthalmological (51%) and endocrine problems (42%) were observed frequently. Computational analysis of facial photographs showed subtle facial dysmorphism. No strong genotype-phenotype correlation was identified by subgrouping mutations into functional classes.ConclusionWe delineate the clinical spectrum of PURA syndrome with the identification of 32 additional individuals. The identification of one individual through targeted Sanger sequencing points towards the clinical recognisability of the syndrome. Genotype-phenotype analysis showed no significant correlation between mutation classes and disease severity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.