2014
DOI: 10.1186/s13059-014-0555-3
|View full text |Cite
|
Sign up to set email alerts
|

Determining the quality and complexity of next-generation sequencing data without a reference genome

Abstract: We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
30
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(30 citation statements)
references
References 43 publications
(52 reference statements)
0
30
0
Order By: Relevance
“…Additional information can be extracted through pairwise comparisons of WGS datasets (Anvar et al , 2014), which can identify problematic samples by highlighting differences between spectra.…”
Section: Introductionmentioning
confidence: 99%
“…Additional information can be extracted through pairwise comparisons of WGS datasets (Anvar et al , 2014), which can identify problematic samples by highlighting differences between spectra.…”
Section: Introductionmentioning
confidence: 99%
“…Indexing HULK utilises the LSH Forest self-tuning indexing scheme as employed in our previous work (Rowe and Winn, 2018) . Briefly, this scheme will take a query and return a subset of nearest-neighbour candidates, based on the number of hash collisions (Bawa et al , 2005) . The two parameters to tune this index are (i) the number of hash functions to encode an item (K) and (ii) the number of hash tables to split an item in to (L).…”
Section: Distance Estimationmentioning
confidence: 99%
“…For example, the pairwise comparison of k-mer spectra is a de novo analysis method that has been routinely used in recent years for clustering microbiomes using dissimilarity measures (Dubinkina et al , 2016;Benoit et al , 2016) . These measures are used to identify microbiome composition changes in studies that involve longitudinal sampling or multiple isolation sites (Anvar et al , 2014) . However, k-mer spectra can still take considerable time to compute, are relatively large in file size and new sample comparisons require additional computation.…”
Section: Introductionmentioning
confidence: 99%
“…This could be attributed to a variety of reasons, including the use of a poor reference genome with missing genetic information (Anvar et al 2014); diseasecausing variants are non-coding SNPs or splice-site mutations (Koboldt et al 2013). To identify candidate disease genes within FAME loci, we propose in silico gene prioritization.…”
Section: Introductionmentioning
confidence: 99%