2016
DOI: 10.1093/bioinformatics/btw663
|View full text |Cite
|
Sign up to set email alerts
|

KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies

Abstract: Motivation De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies.ResultsWe present the K-mer Analysis Toolkit (KAT… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
311
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 434 publications
(337 citation statements)
references
References 10 publications
3
311
0
Order By: Relevance
“…The assemblies were checked for contamination and further manually assessed and corrected using gEVAL [25]. The Kmer Analysis Toolkit (KAT) version 2.4.2 [26] was used to compare k-mers from the 10X Illumina data to k-mers in each of the haplotype-resolved assemblies, and in the combined diploid assembly representing both haplotypes. Phasing of the assembled contigs and scaffolds was visualised using the parental k-mer databases produced by Canu [27].…”
Section: Trio Binning Genome Assemblymentioning
confidence: 99%
“…The assemblies were checked for contamination and further manually assessed and corrected using gEVAL [25]. The Kmer Analysis Toolkit (KAT) version 2.4.2 [26] was used to compare k-mers from the 10X Illumina data to k-mers in each of the haplotype-resolved assemblies, and in the combined diploid assembly representing both haplotypes. Phasing of the assembled contigs and scaffolds was visualised using the parental k-mer databases produced by Canu [27].…”
Section: Trio Binning Genome Assemblymentioning
confidence: 99%
“…The raw data from both experiments were examined using fastQC v.0.11.8 60 and K-mer analysis v.2.4.1 (KAT; 61 . Long-reads data were also examined using SMRT link analysis v.6.0.0.47836 and stsPlots.…”
Section: Genome Quality Controlmentioning
confidence: 99%
“…Secondly, we obtained the unmapped sequencing reads against our genome assembly and re-assembled them into ~359 Mb sequences which contained ~95.05% repetitive sequences ( Supplementary Table 5 ). Using KAT sect tool[26], we also calculated the sequencing depth distribution of flanking sequences of gap regions, we observed that ~71% of sequences nearby junction regions have higher sequencing depth ( Supplementary Fig. 3 ), indicating that majority of the gap regions were repetitive sequences.…”
Section: Analysesmentioning
confidence: 99%