SummaryHere we describe NanoPack, a set of tools developed for visualization and processing of long-read sequencing data from Oxford Nanopore Technologies and Pacific Biosciences.Availability and implementationThe NanoPack tools are written in Python3 and released under the GNU GPL3.0 License. The source code can be found at https://github.com/wdecoster/nanopack, together with links to separate scripts and their documentation. The scripts are compatible with Linux, Mac OS and the MS Windows 10 subsystem for Linux and are available as a graphical user interface, a web service at http://nanoplot.bioinf.be and command line tools.Supplementary information Supplementary data are available at Bioinformatics online.
We sequenced the genome of the Yoruban reference individual NA19240 on the long-read sequencing platform Oxford Nanopore PromethION for evaluation and benchmarking of recently published aligners and germline structural variant calling tools, as well as a comparison with the performance of structural variant calling from short-read sequencing data. The structural variant caller Sniffles after NGMLR or minimap2 alignment provides the most accurate results, but additional confidence or sensitivity can be obtained by a combination of multiple variant callers. Sensitive and fast results can be obtained by minimap2 for alignment and a combination of Sniffles and SVIM for variant identification. We describe a scalable workflow for identification, annotation, and characterization of tens of thousands of structural variants from long-read genome sequencing of an individual or population. By discussing the results of this well-characterized reference individual, we provide an approximation of what can be expected in future long-read sequencing studies aiming for structural variant identification.
Premature termination codon (PTC) mutations in the ATP-Binding Cassette, Sub-Family A, Member 7 gene (ABCA7) have recently been identified as intermediate-to-high penetrant risk factor for late-onset Alzheimer’s disease (LOAD). High variability, however, is observed in downstream ABCA7 mRNA and protein expression, disease penetrance, and onset age, indicative of unknown modifying factors. Here, we investigated the prevalence and disease penetrance of ABCA7 PTC mutations in a large early onset AD (EOAD)—control cohort, and examined the effect on transcript level with comprehensive third-generation long-read sequencing. We characterized the ABCA7 coding sequence with next-generation sequencing in 928 EOAD patients and 980 matched control individuals. With MetaSKAT rare variant association analysis, we observed a fivefold enrichment (p = 0.0004) of PTC mutations in EOAD patients (3%) versus controls (0.6%). Ten novel PTC mutations were only observed in patients, and PTC mutation carriers in general had an increased familial AD load. In addition, we observed nominal risk reducing trends for three common coding variants. Seven PTC mutations were further analyzed using targeted long-read cDNA sequencing on an Oxford Nanopore MinION platform. PTC-containing transcripts for each investigated PTC mutation were observed at varying proportion (5–41% of the total read count), implying incomplete nonsense-mediated mRNA decay (NMD). Furthermore, we distinguished and phased several previously unknown alternative splicing events (up to 30% of transcripts). In conjunction with PTC mutations, several of these novel ABCA7 isoforms have the potential to rescue deleterious PTC effects. In conclusion, ABCA7 PTC mutations play a substantial role in EOAD, warranting genetic screening of ABCA7 in genetically unexplained patients. Long-read cDNA sequencing revealed both varying degrees of NMD and transcript-modifying events, which may influence ABCA7 dosage, disease severity, and may create opportunities for therapeutic interventions in AD.Electronic supplementary materialThe online version of this article (doi:10.1007/s00401-017-1714-x) contains supplementary material, which is available to authorized users.
A substantial amount of structural variation in the human genome remains uninvestigated due to the limitations of existing technologies, the presence of repetitive sequences, and the complexity of a diploid genome. New technologies have been developed, increasing resolution and appreciation of structural variation and how it affects human diversity and disease. The genetic etiology of most patients with complex disorders such as neurodegenerative brain diseases is not yet elucidated, complicating disease diagnosis, genetic counseling, and understanding of underlying pathological mechanisms needed to develop therapeutic interventions. Here, we focus on innovative progress and opportunities provided by the newest methods such as linked read sequencing, strand-specific sequencing, and long-read sequencing. Finally, we describe a strategy for generating a comprehensive catalog of structural variations across populations. Structural Variation Has Been Systematically Missed Multiple projects have comprehensively cataloged small genetic variants [single nucleotide variants (SNVs)]; however, structural variants (SVs) remain largely underrepresented [1,2]. SVs are defined as regions of DNA larger than 50 bp showing a change in copy number or genomic location including copy number variants (CNVs; deletions and duplications), insertions, inversions, translocations, mobile elements, repetitive sequence expansions, and complex combinations thereof (Figure 1) [3]. SVs contribute $3.4 times more nucleotides to human genetic variation than the far more numerous SNVs [4]. Multiple sequencing technologies (see Glossary), notably longread sequencing and short-read sequencing library preparation methods such as strand-specific sequencing (strand-seq) and linked-read sequencing, have been developed, providing an unprecedented potential and accuracy of genome-wide structural variation. Here, we describe new improvements to SV detection methods, each with different strengths and shortcomings. One landmark study combined several technologies to obtain a complete haplotype-specific characterization in healthy human trios, yielding >30 000 SVs per genome, including >150 inversions and large unbalanced chromosomal rearrangements [5]. This work suggested that previous technologies missed most of the SVs due to technical limitations. The majority of SVs are novel and rare variants, implicating that structural variation databases are not saturated yet [2-7]. Highlights The genetic etiology of complex human diseases is still poorly understood, hampering diagnostics and therapeutic approaches.
Emerging evidence suggested a converging mechanism in neurodegenerative brain diseases (NBD) involving early neuronal network dysfunctions and alterations in the homeostasis of neuronal firing as culprits of neurodegeneration. In this study, we used paired-end short-read and direct long-read whole genome sequencing to investigate an unresolved autosomal dominant dementia family significantly linked to 7q36. We identified and validated a chromosomal inversion of ca. 4 Mb, segregating on the disease haplotype and disrupting the coding sequence of dipeptidyl-peptidase 6 gene ( DPP6 ). DPP6 resequencing identified significantly more rare variants—nonsense, frameshift, and missense—in early-onset Alzheimer’s disease (EOAD, p value = 0.03, OR = 2.21 95% CI 1.05–4.82) and frontotemporal dementia (FTD, p = 0.006, OR = 2.59, 95% CI 1.28–5.49) patient cohorts. DPP6 is a type II transmembrane protein with a highly structured extracellular domain and is mainly expressed in brain, where it binds to the potassium channel K v 4.2 enhancing its expression, regulating its gating properties and controlling the dendritic excitability of hippocampal neurons. Using in vitro modeling, we showed that the missense variants found in patients destabilize DPP6 and reduce its membrane expression ( p < 0.001 and p < 0.0001) leading to a loss of protein. Reduced DPP6 and/or K v 4.2 expression was also detected in brain tissue of missense variant carriers. Loss of DPP6 is known to cause neuronal hyperexcitability and behavioral alterations in Dpp6-KO mice. Taken together, the results of our genomic, genetic, expression and modeling analyses, provided direct evidence supporting the involvement of DPP6 loss in dementia. We propose that loss of function variants have a higher penetrance and disease impact, whereas the missense variants have a variable risk contribution to disease that can vary from high to low penetrance. Our findings of DPP6 , as novel gene in dementia, strengthen the involvement of neuronal hyperexcitability and alteration in the homeostasis of neuronal firing as a disease mechanism to further investigate. Electronic supplementary material The online version of this article (10.1007/s00401-019-01976-3) contains supplementary material, which is available to authorized users.
Technological limitations have hindered the large-scale genetic investigation of tandem repeats in disease. We show that long-read sequencing with a single Oxford Nanopore Technologies PromethION flow cell per individual achieves 30× human genome coverage and enables accurate assessment of tandem repeats including the 10,000-bp Alzheimer’s disease-associated ABCA7 VNTR. The Guppy “flip-flop” base caller and tandem-genotypes tandem repeat caller are efficient for large-scale tandem repeat assessment, but base calling and alignment challenges persist. We present NanoSatellite, which analyzes tandem repeats directly on electric current data and improves calling of GC-rich tandem repeats, expanded alleles, and motif interruptions.
Acinetobacter baumannii is a bacterium prioritized by the CDC and WHO because of its increasing antibiotic resistance, leading to treatment failures. The hallmark of this pathogen is the high heterogeneity observed among isolates, due to a very dynamic genome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.