Across a variety of Mendelian disorders, ∼50–75% of patients do not receive a genetic diagnosis by exome sequencing indicating disease-causing variants in non-coding regions. Although genome sequencing in principle reveals all genetic variants, their sizeable number and poorer annotation make prioritization challenging. Here, we demonstrate the power of transcriptome sequencing to molecularly diagnose 10% (5 of 48) of mitochondriopathy patients and identify candidate genes for the remainder. We find a median of one aberrantly expressed gene, five aberrant splicing events and six mono-allelically expressed rare variants in patient-derived fibroblasts and establish disease-causing roles for each kind. Private exons often arise from cryptic splice sites providing an important clue for variant prioritization. One such event is found in the complex I assembly factor TIMMDC1 establishing a novel disease-associated gene. In conclusion, our study expands the diagnostic tools for detecting non-exonic variants and provides examples of intronic loss-of-function variants with pathological relevance.
44Across a large variety of Mendelian disorders, ~50-75% of patients do not receive a 45 genetic diagnosis by whole exome sequencing indicative of underlying disease-causing 46 variants in non-coding regions. In contrast, whole genome sequencing facilitates the 47 discovery of all genetic variants, but their sizeable number, coupled with a poor 48 understanding of the non-coding genome, makes their prioritization challenging. Here, we 49 demonstrate the power of transcriptome sequencing to provide a confirmed genetic 50 diagnosis for 10% (5 of 48) of undiagnosed mitochondrial disease patients and identify 51 strong candidate genes for patients remaining without diagnosis. We found a median of 1 52 aberrantly expressed gene, 5 aberrant splicing events, and 6 mono-allelically expressed 53 rare variants in patient-derived fibroblasts and established disease-causing roles for each 54 kind. Private exons often arose from sites that are weakly spliced in other individuals, 55providing an important clue for future variant prioritization. One such intronic exon-56 creating variant was found in three unrelated families in the complex I assembly factor 57 TIMMDC1, which we consequently established as a novel disease-associated gene. In 58 conclusion, our study expands the diagnostic tools for detecting non-exonic variants of 59Mendelian disorders and provides examples of intronic loss-of-function variants with 60 pathological relevance. 61Despite the revolutionizing impact of whole exome sequencing (WES) on the molecular 62 genetics of Mendelian disorders, ~50-75% of the patients do not receive a genetic diagnosis after 63 WES [1][2][3][4][5][6] . The disease-causing variants might be detected by WES but remain as variants of 64 unknown significance (VUS, Methods) or they are missed due to the inability to prioritize them. 65Many of these VUS are synonymous or non-coding variants that may affect RNA abundance or 66 isoform but cannot be prioritized due to the poor understanding of regulatory sequence to date 67 compared to coding sequence. Furthermore, WES covers only the 2% exonic regions of the 68 genome. Accordingly, it is mostly blind to regulatory variants in non-coding regions that could 69 affect RNA sequence and abundance. While the limitation of genome coverage is overcome by 70 whole genome sequencing (WGS), prioritization and interpretation of variants identified by 71 WGS is in turn limited by their amount [7][8][9] . 72With RNA sequencing (RNA-seq), limitations of the sole genetic information can be 73 complemented by directly probing variations in RNA abundance and in RNA sequence, 74 including allele-specific expression and splice isoforms. At least three extreme situations can be 75 directly interpreted to prioritize candidate disease-causing genes for a rare disorder. First, the 76 expression level of a gene can lie outside its physiological range. Genes with expression outside 77 their physical range can be identified as expression outliers, often using a stringent cutoff on 78 expression variat...
Acute liver failure (ALF) in infancy and childhood is a life-threatening emergency. Few conditions are known to cause recurrent acute liver failure (RALF), and in about 50% of cases, the underlying molecular cause remains unresolved. Exome sequencing in five unrelated individuals with fever-dependent RALF revealed biallelic mutations in NBAS. Subsequent Sanger sequencing of NBAS in 15 additional unrelated individuals with RALF or ALF identified compound heterozygous mutations in an additional six individuals from five families. Immunoblot analysis of mutant fibroblasts showed reduced protein levels of NBAS and its proposed interaction partner p31, both involved in retrograde transport between endoplasmic reticulum and Golgi. We recommend NBAS analysis in individuals with acute infantile liver failure, especially if triggered by fever.
RNA sequencing (RNA-seq) is gaining popularity as a complementary assay to genome sequencing for precisely identifying the molecular causes of rare disorders. A powerful approach is to identify aberrant gene expression levels as potential pathogenic events. However, existing methods for detecting aberrant read counts in RNA-seq data either lack assessments of statistical significance, so that establishing cutoffs is arbitrary, or rely on subjective manual corrections for confounders. Here, we describe OUTRIDER (Outlier in RNA-Seq Finder), an algorithm developed to address these issues. The algorithm uses an autoencoder to model read-count expectations according to the gene covariation resulting from technical, environmental, or common genetic variations. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. The model is automatically fitted to achieve the best recall of artificially corrupted data. Precision-recall analyses using simulated outlier read counts demonstrated the importance of controlling for covariation and significance-based thresholds. OUTRIDER is open source and includes functions for filtering out genes not expressed in a dataset, for identifying outlier samples with too many aberrantly expressed genes, and for detecting aberrant gene expression on the basis of false-discovery-rate-adjusted p values. Overall, OUTRIDER provides an end-to-end solution for identifying aberrantly expressed genes and is suitable for use by rare-disease diagnostic platforms.
High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.