The analysis of next-generation sequence (NGS) data is often a fragmented step-wise process. For example, multiple pieces of software are typically needed to map NGS reads, extract variant sites, and construct a DNA sequence matrix containing only single nucleotide polymorphisms (i.e., a SNP matrix) for a set of individuals. The management and chaining of these software pieces and their outputs can often be a cumbersome and difficult task. Here, we present CFSAN SNP Pipeline, which combines into a single package the mapping of NGS reads to a reference genome with Bowtie2, processing of those mapping (BAM) files using SAMtools, identification of variant sites using VarScan, and production of a SNP matrix using custom Python scripts. We also introduce a Python package (CFSAN SNP Mutator) that when given a reference genome will generate variants of known position against which we validate our pipeline. We created 1,000 simulated Salmonella enterica sp. enterica Serovar Agona genomes at 100× and 20× coverage, each containing 500 SNPs, 20 single-base insertions and 20 single-base deletions. For the 100× dataset, the CFSAN SNP Pipeline recovered 98.9% of the introduced SNPs and had a false positive rate of 1.04 × 10 −6 ; for the 20× dataset 98.8% of SNPs were recovered and the false positive rate was 8.34 × 10 −7 . Based on these results, CFSAN SNP Pipeline is a robust and accurate tool that it is among the first to combine into a single executable the myriad steps required to produce a SNP matrix from NGS data. Such a tool is useful to those working in an applied setting (e.g., food safety traceback investigations) as well as for those interested in evolutionary questions.Subjects Bioinformatics
Continental lithosphere formed and reworked during the Palaeoproterozoic era is a major component of pre-1070 Ma Australia and the East Antarctic Shield. Within this lithosphere, the Mawson Continent encompasses the Gawler–Adélie Craton in southern Australia and Antarctica, and crust of the Miller Range, Transantarctic Mountains, which are interpreted to have assembled during c. 1730–1690 Ma tectonism of the Kimban–Nimrod–Strangways orogenies. Recent geochronology has strengthened correlations between the Mawson Continent and Shackleton Range (Antarctica), but the potential for Meso- to Neoproterozoic rifting and/or accretion events prevent any confident extension of the Mawson Continent to include the Shackleton Range. Proposed later addition (c. 1600–1550 Ma) of the Coompana Block and its Antarctic extension provides the final component of the Mawson Continent. A new model proposed for the late Archaean to early Mesoproterozoic evolution of the Mawson Continent highlights important timelines in the tectonic evolution of the Australian lithosphere. The Gawler–Adélie Craton and adjacent Curnamona Province are interpreted to share correlatable timelines with the North Australian Craton at c. 2500–2430 Ma, c. 2000 Ma, 1865–1850 Ma, 1730–1690 Ma and 1600–1550 Ma. These common timelines are used to suggest the Gawler–Adélie Craton and North Australian Craton formed a contiguous continental terrain during the entirety of the Palaeoproterozoic. Revised palaeomagnetic constraints for global correlation of proto-Australia highlight an apparently static relationship with northwestern Laurentia during the c. 1730–1590 Ma time period. These data have important implications for many previously proposed reconstruction models and are used as a primary constraint in the configuration of the reconstruction model proposed herein. This palaeomagnetic link strengthens previous correlations between the Wernecke region of northwestern Laurentia and terrains in the eastern margin of proto-Australia.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.