2017
DOI: 10.1038/s41598-017-02487-5
|View full text |Cite
|
Sign up to set email alerts
|

FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads

Abstract: We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina “Platinum” genomes is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be use… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
42
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 40 publications
(43 citation statements)
references
References 41 publications
1
42
0
Order By: Relevance
“…These genomic alterations are typically detected by genotype calling on mapped reads (e.g., Samtools mpileup [ 89 ] and GATK HaplotypeCaller [ 90 ]). However, alignment-free tools (FastGT [ 73 ] and LAVA [ 71 ]) allow for genotyping of known variants directly from next-generation sequencing data, based on k -mer analysis. Since these methods are 1–2 orders of magnitude faster than traditional mapping-based detection, they seem to be ideally suited for clinical applications, where sequencing data from a large number of individuals need to be processed in a timely manner.…”
Section: How Are Alignment-free Methods Used In Next-generation Sequementioning
confidence: 99%
“…These genomic alterations are typically detected by genotype calling on mapped reads (e.g., Samtools mpileup [ 89 ] and GATK HaplotypeCaller [ 90 ]). However, alignment-free tools (FastGT [ 73 ] and LAVA [ 71 ]) allow for genotyping of known variants directly from next-generation sequencing data, based on k -mer analysis. Since these methods are 1–2 orders of magnitude faster than traditional mapping-based detection, they seem to be ideally suited for clinical applications, where sequencing data from a large number of individuals need to be processed in a timely manner.…”
Section: How Are Alignment-free Methods Used In Next-generation Sequementioning
confidence: 99%
“…‘PhenotypeSeeker prediction’ uses the regression model generated by ‘PhenotypeSeeker modeling’ to conduct fast phenotype predictions on input samples (Fig 1). Using gmer_counter from the FastGT package [17], the tool searches the samples only for the k -mers used as parameters in the regression model. Predictions are then made based on the presence or absence of these k -mers.…”
Section: Resultsmentioning
confidence: 99%
“…More precisely, we will analyze whether the k-mers of a given signature are present in the reads and use such information as an hint of the presence of the allele. Unlike other approaches [27], Definition 2 admits the presence of the alleles of multiple variants in a single signature, allowing MALVA to manage variants that are not k-isolated. Indeed, the set of signatures of an allele represents all the genomic regions where the allele appears in the genomes encoded by the VCF file.…”
Section: Preliminariesmentioning
confidence: 99%
“…Their strategy is to create a dictionary for both the reference genome and the SNP list that maps each k-mer to the positions at which it appears, and then to call variants from the reads by evaluating k-mers frequency. FastGT [27] is yet another k-mer-based method to genotype sequencing data: it strongly relies on a pre-compiled database of bi-allelic SNVs and corresponding k-mers, obtained by subjecting the k-mers that overlap known SNVs to several filtering steps. Such filters remove from the database the SNPs for which unique k-mers (i.e., not occurring elsewhere in the reference genome) are not observed, those that are closely located (i.e., that are less than k bases apart), and others: after the filtering steps, only 64% of bi-allelic SNVs survive and are therefore identifiable.…”
Section: Introduction and Related Workmentioning
confidence: 99%