Slider—maximum use of probability information for alignment of short sequence reads and SNP detection

Malhis, Nawar; Butterfield, Yaron S.N.; Ester, Martin; Jones, Steven J.M.

doi:10.1093/bioinformatics/btn565

Cited by 45 publications

(30 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The short read lengths and the high error rates of reads generated by the Illumina GA pose new computational challenges for the accurate detection of SNPs. Many methods have been developed for aligning short reads with multiple errors to a reference sequence and SNP calling for the Illumina GA (Li et al , 2009aMalhis et al 2009). In particular, MAQ represents an efficient, easy-to-use and popular tool for read alignment and SNP calling.…”

Section: Discussionmentioning

confidence: 99%

Accurate detection and genotyping of SNPs utilizing population sequencing data

Bansal

Harismendy²,

Tewhey³

et al. 2010

Genome Res.

View full text Add to dashboard Cite

Next-generation sequencing technologies have made it possible to sequence targeted regions of the human genome in hundreds of individuals. Deep sequencing represents a powerful approach for the discovery of the complete spectrum of DNA sequence variants in functionally important genomic intervals. Current methods for single nucleotide polymorphism (SNP) detection are designed to detect SNPs from single individual sequence data sets. Here, we describe a novel method SNIP-Seq (single nucleotide polymorphism identification from population sequence data) that leverages sequence data from a population of individuals to detect SNPs and assign genotypes to individuals. To evaluate our method, we utilized sequence data from a 200-kilobase (kb) region on chromosome 9p21 of the human genome. This region was sequenced in 48 individuals (five sequenced in duplicate) using the Illumina GA platform. Using this data set, we demonstrate that our method is highly accurate for detecting variants and can filter out false SNPs that are attributable to sequencing errors. The concordance of sequencing-based genotype assignments between duplicate samples was 98.8%. The 200-kb region was independently sequenced to a high depth of coverage using two sequence pools containing the 48 individuals. Many of the novel SNPs identified by SNIP-Seq from the individual sequencing were validated by the pooled sequencing data and were subsequently confirmed by Sanger sequencing. We estimate that SNIP-Seq achieves a low falsepositive rate of ;2%, improving upon the higher false-positive rate for existing methods that do not utilize population sequence data. Collectively, these results suggest that analysis of population sequencing data is a powerful approach for the accurate detection of SNPs and the assignment of genotypes to individual samples.

show abstract

Section: Discussionmentioning

confidence: 99%

Accurate detection and genotyping of SNPs utilizing population sequencing data

Bansal

Harismendy²,

Tewhey³

et al. 2010

Genome Res.

View full text Add to dashboard Cite

show abstract

“…If an exact match of a seed s exists, then we extend it to the whole read and find mismatches for the whole read (Algorithm 3, lines 9-11). If mismatches are less than e then a read with its location in the reference genome is yielded (Algorithm 3, lines [12][13]. If the mismatches are more than e then the edit distance for the whole read is calculated and if this computed edit distance is less than e, then a read with its location in the reference genome is outputted (Algorithm 3, lines [14][15][16][17].…”

Section: Phase Iii-read Mappingmentioning

confidence: 99%

“…To bypass the large memory requirement, slider [13] proposes a sequence alignment by merge-sorting the reference genome subsequences and read sequences. Recently, string matching algorithms based on the Burrow-Wheeler Transformation (BWT) [14], which is a string compression technique, has drawn the attention of many research groups.…”

mentioning

confidence: 99%

StreamAligner: a streaming based sequence aligner on Apache Spark

Rathee

Kashyap

2018

J Big Data

View full text Add to dashboard Cite

“…Similaritybased methods use sequence identity to determine how alike sequences are: BLAST 18,19 ' 21 and number of mismatches [22][23][24][25] are commonly used measures. Sequence composition methods use instead intrinsic features of the sequences to determine their similarity, such as their GC-content 26 or fc-nucleotide frequencies.…”

Section: ~20mentioning

confidence: 99%

Accurate Taxonomic Assignment of Short Pyrosequencing Reads

2009

View full text Add to dashboard Cite

Ambiguities in the taxonomy dependent assignment of pyrosequencing reads are usually resolved by mapping each read to the lowest common ancestor in a reference taxonomy of all those sequences that match the read. This conservative approach has the drawback of mapping a read to a possibly large clade that may also contain many sequences not matching the read. A more accurate taxonomic assignment of short reads can be made by mapping each read to the node in the reference taxonomy that provides the best precision and recall. We show that given a suffix array for the sequences in the reference taxonomy, a short read can be mapped to the node of the reference taxonomy with the best combined value of precision and recall in time linear in the size of the taxonomy subtree rooted at the lowest common ancestor of the matching sequences. An accurate taxonomic assignment of short reads can thus be made with about the same efficiency as when mapping each read to the lowest common ancestor of all matching sequences in a reference taxonomy. We demonstrate the effectiveness of our approach on several metagenomic datasets of marine and gut microbiota.

show abstract

Slider—maximum use of probability information for alignment of short sequence reads and SNP detection

Cited by 45 publications

References 12 publications

Accurate detection and genotyping of SNPs utilizing population sequencing data

Accurate detection and genotyping of SNPs utilizing population sequencing data

StreamAligner: a streaming based sequence aligner on Apache Spark

Accurate Taxonomic Assignment of Short Pyrosequencing Reads

Contact Info

Product

Resources

About