2015
DOI: 10.1101/020024
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SAM/BAM format v1.5 extensions for de novo assemblies

Abstract: Summary:The plain text Sequence Alignment/Map (SAM) file format and its companion binary form (BAM) are a generic alignment format for storing read alignments against reference sequences (and unmapped reads) together with structured meta-data . Driven by the needs of the 1000 Genomes Project which sequenced many individual human genomes, early SAM/BAM usage focused on pairwise alignments of reads to a reference. However, through the CIGAR P operator multiple sequence alignments can also be preserved. Herein we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 14 publications
0
12
0
Order By: Relevance
“…Concatenated assemblies were cleaned of Illumina artifact low-complexity (long homopolymer) contigs with prinseq-lite v0.20.4 ( 70 ), removing contigs consisting of greater than 80% one nucleotide. Mixed-assembled reads were then mapped back to their original metagenomes using bowtie2 v2.2.6 ( 71 ), and sam files were converted to bam files and sorted using samtools v1.5 ( 72 ) and deduplicated with picard v2.10.3 ( https://github.com/broadinstitute/picard.git ). Mixed-assembly contigs were binned into putative MAGs with metabat2 v2.12.1 ( 73 ), and bins were validated with CheckM ( 74 ), hmmer v3.1b2 ( 75 ), and pplacer v1.1 alpha19 ( 76 ).…”
Section: Methodsmentioning
confidence: 99%
“…Concatenated assemblies were cleaned of Illumina artifact low-complexity (long homopolymer) contigs with prinseq-lite v0.20.4 ( 70 ), removing contigs consisting of greater than 80% one nucleotide. Mixed-assembled reads were then mapped back to their original metagenomes using bowtie2 v2.2.6 ( 71 ), and sam files were converted to bam files and sorted using samtools v1.5 ( 72 ) and deduplicated with picard v2.10.3 ( https://github.com/broadinstitute/picard.git ). Mixed-assembly contigs were binned into putative MAGs with metabat2 v2.12.1 ( 73 ), and bins were validated with CheckM ( 74 ), hmmer v3.1b2 ( 75 ), and pplacer v1.1 alpha19 ( 76 ).…”
Section: Methodsmentioning
confidence: 99%
“…Briefly, the draft genome was first indexed using BWA (version 0.7.15-r1140) [24] and the basecalled reads were aligned to the draft genome using BWA. SAMtools (version 1.6 using htslib 1.6) [25] was then used to sort and index the alignment. Nanopolish then computed the new consensus sequence in 50 kb blocks in parallel, which were then merged into the polished assembly.…”
Section: Long-read Basecalling De Novo Assembly and Genome Polishingmentioning
confidence: 99%
“…However, lack of a closed reference genome may cause biases in allelic proportions due to mapping errors. Updates to the sequence alignment format (SAM/BAM) to accommodate storing de novo sequence alignments can resolve this issue ( Cock et al, 2015 ). For reference-based SNP discovery, reads can fail to align to regions of high divergence ( Bertels et al, 2014 ) if the short read aligner is too stringent.…”
Section: Introductionmentioning
confidence: 99%