2016
DOI: 10.1101/064733
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

KAT: A K-mer Analysis Toolkit to quality control NGS datasets and genome assemblies

Abstract: Motivation: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies. Results: We present the K-mer Analysis Toolkit … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
153
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 124 publications
(155 citation statements)
references
References 10 publications
2
153
0
Order By: Relevance
“…The remaining 152 contigs which were not scaffolded in the 12 chromosomes were stitched together to create chromosome 00 (9.6 Mbp) with a 100 bp gap inserted between adjacent contigs. Genome assemblies can be validated for completeness by comparing the k-mer's present in the raw genomic DNA-Seq reads with those in the genome assembly (Mapleson et al 2017). The kmer spectra of SL4.0 genome assembly shows a single homozygous peak at the expected (20x) coverage based on k-mer analysis (Suppl.…”
Section: De Novo Assembly Sl40mentioning
confidence: 99%
“…The remaining 152 contigs which were not scaffolded in the 12 chromosomes were stitched together to create chromosome 00 (9.6 Mbp) with a 100 bp gap inserted between adjacent contigs. Genome assemblies can be validated for completeness by comparing the k-mer's present in the raw genomic DNA-Seq reads with those in the genome assembly (Mapleson et al 2017). The kmer spectra of SL4.0 genome assembly shows a single homozygous peak at the expected (20x) coverage based on k-mer analysis (Suppl.…”
Section: De Novo Assembly Sl40mentioning
confidence: 99%
“…These data were derived from a lab reared colony of biotype 4 which was unlikely to be contaminated by parasitoid wasp larvae. The Canu assembly was checked for contamination by creating a GC-content coverage plot using KAT sect from the K-mer analysis toolkit (KAT) (Mapleson et al 2017). For this analysis, all biotype 4 Illumina MiSeq libraries from Wenger et.…”
Section: Reassembly Of a Glycines Biotypementioning
confidence: 99%
“…A KAT k-mer spectra copy number plot provides information to analyze how much and what type of k-mer content from reads is present in an assembly (Mapleson et al 2017). It decomposes the kmer spectrum of a read data set by the frequency in which the kmers are encountered in the assembly.…”
Section: Genome Assemblymentioning
confidence: 99%
“…Contigs were scaffolded using the PE, LMP, and TALL reads and the SOAPdenovo2 (Luo et al 2012) prepare→map→scaffold pipeline, run at k = 71. Contigs and scaffolds were quality controlled using KAT spectra-cn plots (Mapleson et al 2017) to assess motif representation.…”
Section: Genome Assemblymentioning
confidence: 99%