Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data.Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets.Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/.Contact: marcella.attimonelli@uniba.itSupplementary information: Supplementary data are available at Bioinformatics online.
BackgroundWhole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology.ResultsA previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering.An accurate survey was carried out on 1242 individuals. 215 indels, mostly heteroplasmic, and 3407 single base variants were mapped. A homogeneous mismatches distribution was observed along the whole mitochondrial genome, while a lower frequency of indels was found within protein-coding regions, where frameshift mutations may be deleterious. The majority of indels and mismatches found were not previously annotated in mitochondrial databases since conventional sequencing methods were limited to homoplasmy or quasi-homoplasmy detection. Intriguingly, upon filtering out non haplogroup-defining variants, we detected a widespread population occurrence of rare events predicted to be damaging. Eventually, samples were stratified into blood- and lymphoblastoid-derived to detect possibly different trends of mutability in the two datasets, an analysis which did not yield significant discordances.ConclusionsTo the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies.
Success rates for genomic analyses of highly heterogeneous disorders can be greatly improved if a large cohort of patient data is assembled to enhance collective capabilities for accurate sequence variant annotation, analysis, and interpretation. Indeed, molecular diagnostics requires the establishment of robust data resources to enable data sharing that informs accurate understanding of genes, variants, and phenotypes. The “Mitochondrial Disease Sequence Data Resource (MSeqDR) Consortium” is a grass-roots effort facilitated by the United Mitochondrial Disease Foundation to identify and prioritize specific genomic data analysis needs of the global mitochondrial disease clinical and research community. A central Web portal (https://mseqdr.org) facilitates the coherent compilation, organization, annotation, and analysis of sequence data from both nuclear and mitochondrial genomes of individuals and families with suspected mitochondrial disease. This Web portal provides users with a flexible and expandable suite of resources to enable variant-, gene-, and exome-level sequence analysis in a secure, Web-based, and user-friendly fashion. Users can also elect to share data with other MSeqDR Consortium members, or even the general public, either by custom annotation tracks or through use of a convenient distributed annotation system (DAS) mechanism. A range of data visualization and analysis tools are provided to facilitate user interrogation and understanding of genomic, and ultimately phenotypic, data of relevance to mitochondrial biology and disease. Currently available tools for nuclear and mitochondrial gene analyses include an MSeqDR GBrowse instance that hosts optimized mitochondrial disease and mitochondrial DNA (mtDNA) specific annotation tracks, as well as an MSeqDR locus-specific database (LSDB) that curates variant data on more than 1,300 genes that have been implicated in mitochondrial disease and/or encode mitochondria-localized proteins. MSeqDR is integrated with a diverse array of mtDNA data analysis tools that are both freestanding and incorporated into an online exome-level dataset curation and analysis resource (GEM.app) that is being optimized to support needs of the MSeqDR community. In addition, MSeqDR supports mitochondrial disease phenotyping and ontology tools, and provides variant pathogenicity assessment features that enable community review, feedback, and integration with the public ClinVar variant annotation resource. A centralized Web-based informed consent process is being developed, with implementation of a Global Unique Identifier (GUID) system to integrate data deposited on a given individual from different sources. Community-based data deposition into MSeqDR has already begun. Future efforts will enhance capabilities to incorporate phenotypic data that enhance genomic data analyses. MSeqDR will fill the existing void in bioinformatics tools and centralized knowledge that are necessary to enable efficient nuclear and mtDNA genomic data interpretation by a range of shareholders across bo...
The HmtDB resource hosts a database of human mitochondrial genome sequences from individuals with healthy and disease phenotypes. The database is intended to support both population geneticists as well as clinicians undertaking the task to assess the pathogenicity of specific mtDNA mutations. The wide application of next-generation sequencing (NGS) has provided an enormous volume of high-resolution data at a low price, increasing the availability of human mitochondrial sequencing data, which called for a cogent and significant expansion of HmtDB data content that has more than tripled in the current release. We here describe additional novel features, including: (i) a complete, user-friendly restyling of the web interface, (ii) links to the command-line stand-alone and web versions of the MToolBox package, an up-to-date tool to reconstruct and analyze human mitochondrial DNA from NGS data and (iii) the implementation of the Reconstructed Sapiens Reference Sequence (RSRS) as mitochondrial reference sequence. The overall update renders HmtDB an even more handy and useful resource as it enables a more rapid data access, processing and analysis. HmtDB is accessible at http://www.hmtdb.uniba.it/.
RNA editing is a widespread co/posttranscriptional mechanism affecting primary RNAs by specific nucleotide modifications, which plays relevant roles in molecular processes including regulation of gene expression and/or the processing of noncoding RNAs. In recent years, the detection of editing sites has been improved through the availability of high-throughput RNA sequencing (RNA-Seq) technologies. Accurate bioinformatics pipelines are essential for the analysis of next-generation sequencing (NGS) data to ensure the correct identification of edited sites. Several pipelines, using various read mappers and variant callers with a wide range of adjustable parameters, are available for the detection of RNA editing events. In this review, we discuss some of the most recent and popular tools and provide guidelines for RNA-Seq data generation and analysis for the detection of RNA editing in massive transcriptome data. Using simulated and real data sets, we provide an overview of their behavior, emphasizing the fact that the RNA editing detection in NGS data sets remains a challenging task.
Assigning a pathogenic role to mitochondrial DNA (mtDNA) variants and unveiling the potential involvement of the mitochondrial genome in diseases are challenging tasks in human medicine. Assuming that rare variants are more likely to be damaging, we designed a phylogeny-based prioritization workflow to obtain a reliable pool of candidate variants for further investigations. The prioritization workflow relies on an exhaustive functional annotation through the mtDNA extraction pipeline MToolBox and includes Macro Haplogroup Consensus Sequences to filter out fixed evolutionary variants and report rare or private variants, the nucleotide variability as reported in HmtDB and the disease score based on several predictors of pathogenicity for non-synonymous variants. Cutoffs for both the disease score as well as for the nucleotide variability index were established with the aim to discriminate sequence variants contributing to defective phenotypes. The workflow was validated on mitochondrial sequences from Leber’s Hereditary Optic Neuropathy affected individuals, successfully identifying 23 variants including the majority of the known causative ones. The application of the prioritization workflow to cancer datasets allowed to trim down the number of candidate for subsequent functional analyses, unveiling among these a high percentage of somatic variants. Prioritization criteria were implemented in both standalone (http://sourceforge.net/projects/mtoolbox/) and web version (https://mseqdr.org/mtoolbox.php) of MToolBox.Electronic supplementary materialThe online version of this article (doi:10.1007/s00439-015-1615-9) contains supplementary material, which is available to authorized users.
The colonization of the nuclear genome by mitochondrial DNA is an ongoing process in eukaryotes and plays an important role in genomic variability. Notwithstanding the DNA sequence availability of about 100 complete eukaryotic genomes, up to now NumtS distribution has been fully reported for a small number of sequenced eukaryotic species. With the aim to clarify the time and way of NumtS evolution, we explored the genomic distribution of NumtS in 23 eukaryotic species using an intra/interspecies in silico approach based on a cross-species similarity search and deeply investigate the evolution of NumtS in mammals. The intra- and interspecies analysis underlined how some mitochondrial regions that populated nuclear genomes can be considered as hotspots. Considering the large amount of NumtS we found in platypus and opossum genomes, we hypothesized the occurrence of an earlier colonization that happened prior to the Prototherian/Therian mammal divergence, approximately 160–210 million years ago. These events are still detectable due to the species-specific dynamics that have affected these genomes. Phylogenetic analyses of NumtS derived from two different mitochondrial DNA loci allowed us to recognize the unusual NumtS evolution that acted differently on primate and non-primate species’ genomes.
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and disease. MSeqDR-LSDB is a locus specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar-compliant variant annotations. PhenoTips is used for phenotypic data submission on de-identified patients using human phenotype ontology terminology. Development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.