BackgroundRecent innovations in sequencing technologies have provided researchers with the ability to rapidly characterize the microbial content of an environmental or clinical sample with unprecedented resolution. These approaches are producing a wealth of information that is providing novel insights into the microbial ecology of the environment and human health. However, these sequencing-based approaches produce large and complex datasets that require efficient and sensitive computational analysis workflows. Many recent tools for analyzing metagenomic-sequencing data have emerged, however, these approaches often suffer from issues of specificity, efficiency, and typically do not include a complete metagenomic analysis framework.ResultsWe present PathoScope 2.0, a complete bioinformatics framework for rapidly and accurately quantifying the proportions of reads from individual microbial strains present in metagenomic sequencing data from environmental or clinical samples. The pipeline performs all necessary computational analysis steps; including reference genome library extraction and indexing, read quality control and alignment, strain identification, and summarization and annotation of results. We rigorously evaluated PathoScope 2.0 using simulated data and data from the 2011 outbreak of Shiga-toxigenic Escherichia coli O104:H4.ConclusionsThe results show that PathoScope 2.0 is a complete, highly sensitive, and efficient approach for metagenomic analysis that outperforms alternative approaches in scope, speed, and accuracy. The PathoScope 2.0 pipeline software is freely available for download at: http://sourceforge.net/projects/pathoscope/.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly-which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico ''environmental'' samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.
BackgroundThe relationships between infections in early life and asthma are not completely understood. Likewise, the clinical relevance of microbial communities present in the respiratory tract is only partially known. A number of microbiome studies analyzing respiratory tract samples have found increased proportions of gamma-Proteobacteria including Haemophilus influenzae, Moraxella catarrhalis, and Firmicutes such as Streptococcus pneumoniae. The aim of this study was to present a new approach that combines RNA microbial identification with host gene expression to characterize and validate metagenomic taxonomic profiling in individuals with asthma.MethodsUsing whole metagenomic shotgun RNA sequencing, we characterized and compared the microbial communities of individuals, children and adolescents, with asthma and controls. The resulting data were analyzed by partitioning human and microbial reads. Microbial reads were then used to characterize the microbial diversity of each patient, and potential differences between asthmatic and healthy groups. Human reads were used to assess the expression of known genes involved in the host immune response to specific pathogens and detect potential differences between those with asthma and controls.ResultsMicrobial communities in the nasal cavities of children differed significantly between asthmatics and controls. After read count normalization, some bacterial species were significantly overrepresented in asthma patients (Wald test, p-value < 0.05), including Escherichia coli and Psychrobacter. Among these, Moraxella catarrhalis exhibited ~14-fold over abundance in asthmatics versus controls. Differential host gene expression analysis confirms that the presence of Moraxella catarrhalis is associated to a specific M. catarrhalis core gene signature expressed by the host.ConclusionsFor the first time, we show the power of combining RNA taxonomic profiling and host gene expression signatures for microbial identification. Our approach not only identifies microbes from metagenomic data, but also adds support to these inferences by determining if the host is mounting a response against specific infectious agents. In particular, we show that M. catarrhalis is abundant in asthma patients but not in controls, and that its presence is associated with a specific host gene expression signature.Electronic supplementary materialThe online version of this article (doi:10.1186/s12920-015-0121-1) contains supplementary material, which is available to authorized users.
BackgroundThe use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.ResultsHere we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.ConclusionsClinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2105-15-262) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.