An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with<i>Salmonella</i>

Pettengill, James B.; Luo, Yan; Davis, S. Scott; Chen, Yi; González-Escalona, Narjol; Ottesen, Andrea; Rand, Hugh; Allard, Marc W.; Strain, Errol

doi:10.7717/peerj.620

Cited by 45 publications

(51 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the case of this E. coli dataset, the phylogeny inferred by Mugsy, a reference-independent approach, was in topological agreement with other reference-dependent approaches (Table 2). In fact, kSNPv3 was one of the only methods that returned a topology that was inconsistent with all other methods (Table 2); an inconsistent kSNP phylogeny has also been reported in the analysis of other datasets (Pettengill et al, 2014). To analyze this further, we identified SNPs (n = 826) from the NASP run using simulated paired-end reads that were uniquely shared on a branch of the phylogeny that defines a monophyletic lineage (Fig.…”

Section: Pipeline Comparisons On E Coli Genomes Data Setmentioning

confidence: 97%

“…Previously, it has been demonstrated that different phylogenies can be obtained for the same dataset using either RAxML or FastTree2 (Pettengill et al, 2014). To test this result across multiple phylogenetic inference methods, the NASP E. coli read dataset was used.…”

Section: Phylogeny Differences For the Same Datasetmentioning

confidence: 99%

“…The program lyve-SET has been applied to outbreak investigations and uses raw or simulated reads to identify SNPs (Katz et al, 2013). Finally, the CFSAN SNP pipeline is a published method from the United States Food and Drug Administration that only supports the use of raw reads (Pettengill et al, 2014). There have been, to our knowledge, no published comparative studies to compare the functionality of these pipelines on a range of test datasets.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

NASP: an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats

et al. 2016

View full text Add to dashboard Cite

Whole-genome sequencing (WGS) of bacterial isolates has become standard practice in many laboratories. Applications for WGS analysis include phylogeography and molecular epidemiology, using single nucleotide polymorphisms (SNPs) as the unit of evolution. NASP was developed as a reproducible method that scales well with the hundreds to thousands of WGS data typically used in comparative genomics applications. In this study, we demonstrate how NASP compares with other tools in the analysis of two real bacterial genomics datasets and one simulated dataset. Our results demonstrate that NASP produces similar, and often better, results in comparison with other pipelines, but is much more flexible in terms of data input types, job management systems, diversity of supported tools and output formats. We also demonstrate differences in results based on the choice of the reference genome and choice of inferring phylogenies from concatenated SNPs or alignments including monomorphic positions. NASP represents a source-available, version-controlled, unit-tested method and can be obtained from tgennorth.github.io/NASP.

show abstract

Section: Pipeline Comparisons On E Coli Genomes Data Setmentioning

confidence: 97%

Section: Phylogeny Differences For the Same Datasetmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

NASP: an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats

et al. 2016

View full text Add to dashboard Cite

show abstract

“…In the United States, nationwide real-time whole-genome sequencing (WGS) was implemented using the GenomeTrakr and PulseNet network to enhance listeriosis outbreak detection and investigation (14). In several outbreak investigations, the U.S. Centers for Disease Control and Prevention (CDC) had employed a whole-genome multilocus sequence typing (wgMLST) tool that targets the allelic differences in genome-wide coding regions (14), and the U.S. Food and Drug Administration (FDA) had employed a reference-based Center for Food Safety and Applied Nutrition (CFSAN) SNP Pipeline that identifies single nucleotide polymorphisms (SNPs) in the entire genome, including core genes, accessory genes, and intergenic regions (8, 11, 15). …”

Section: Introductionmentioning

confidence: 99%

Listeria monocytogenes in Stone Fruits Linked to a Multistate Outbreak: Enumeration of Cells and Whole-Genome Sequencing

Chen

Burall

Luo

et al. 2016

Appl Environ Microbiol

Self Cite

View full text Add to dashboard Cite

In 2014, the identification of stone fruits contaminated with Listeria monocytogenes led to the subsequent identification of a multistate outbreak. Simultaneous detection and enumeration of L. monocytogenes were performed on 105 fruits, each weighing 127 to 145 g, collected from 7 contaminated lots. The results showed that 53.3% of the fruits yielded L. monocytogenes (lower limit of detection, 5 CFU/fruit), and the levels ranged from 5 to 2,850 CFU/fruit, with a geometric mean of 11.3 CFU/fruit (0.1 CFU/g of fruit). Two serotypes, IVb-v1 and 1/2b, were identified by a combination of PCR- and antiserum-based serotyping among isolates from fruits and their packing environment; certain fruits contained a mixture of both serotypes. Single nucleotide polymorphism (SNP)-based whole-genome sequencing (WGS) analysis clustered isolates from two case-patients with the serotype IVb-v1 isolates and distinguished outbreak-associated isolates from pulsed-field gel electrophoresis (PFGE)-matched, but epidemiologically unrelated, clinical isolates. The outbreak-associated isolates differed by up to 42 SNPs. All but one serotype 1/2b isolate formed another WGS cluster and differed by up to 17 SNPs. Fully closed genomes of isolates from the stone fruits were used as references to maximize the resolution and to increase our confidence in prophage analysis. Putative prophages were conserved among isolates of each WGS cluster. All serotype IVb-v1 isolates belonged to singleton sequence type 382 (ST382); all but one serotype 1/2b isolate belonged to clonal complex 5.IMPORTANCE WGS proved to be an excellent tool to assist in the epidemiologic investigation of listeriosis outbreaks. The comparison at the genome level contributed to our understanding of the genetic diversity and variations among isolates involved in an outbreak or isolates associated with food and environmental samples from one facility. Fully closed genomes increased our confidence in the identification and comparison of accessory genomes. The diversity among the outbreak-associated isolates and the inclusion of PFGE-matched, but epidemiologically unrelated, isolates demonstrate the high resolution of WGS. The prevalence and enumeration data could contribute to our further understanding of the risk associated with Listeria monocytogenes contamination, especially among high-risk populations.

show abstract

“…This eliminates any biases potentially introduced due to the selection of a reference and allows for the detection of SNVs not present in the reference genome. However, as noted by Pettengill et al, a reference-free approach may lead to a higher SNV false discovery rate without appropriate thresholds (177). The software package kSNP (178,179) takes a reference-free approach to identifying SNVs by breaking up each genomic data set into k-mers and comparing these k-mers.…”

Section: Phylogenetics To Phylogenomicsmentioning

confidence: 99%

A Primer on Infectious Disease Bacterial Genomics

et al. 2016

View full text Add to dashboard Cite

SUMMARYThe number of large-scale genomics projects is increasing due to the availability of affordable high-throughput sequencing (HTS) technologies. The use of HTS for bacterial infectious disease research is attractive because one whole-genome sequencing (WGS) run can replace multiple assays for bacterial typing, molecular epidemiology investigations, and more in-depth pathogenomic studies. The computational resources and bioinformatics expertise required to accommodate and analyze the large amounts of data pose new challenges for researchers embarking on genomics projects for the first time. Here, we present a comprehensive overview of a bacterial genomics projects from beginning to end, with a particular focus on the planning and computational requirements for HTS data, and provide a general understanding of the analytical concepts to develop a workflow that will meet the objectives and goals of HTS projects.

show abstract

An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study withSalmonella

Cited by 45 publications

References 49 publications

NASP: an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats

NASP: an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats

Listeria monocytogenes in Stone Fruits Linked to a Multistate Outbreak: Enumeration of Cells and Whole-Genome Sequencing

A Primer on Infectious Disease Bacterial Genomics

Contact Info

Product

Resources

About