2019
DOI: 10.3389/fgene.2019.01156
|View full text |Cite
|
Sign up to set email alerts
|

Reads Binning Improves Alignment-Free Metagenome Comparison

Abstract: Comparing metagenomic samples is a critical step in understanding the relationships among microbial communities. Recently, next-generation sequencing (NGS) technologies have produced a massive amount of short reads data for microbial communities from different environments. The assembly of these short reads can, however, be time-consuming and challenging. In addition, alignment-based methods for metagenome comparison are limited by incomplete genome and/or pathway databases. In contrast, alignment-free methods… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3
1

Relationship

3
7

Authors

Journals

citations
Cited by 16 publications
(13 citation statements)
references
References 61 publications
(88 reference statements)
1
12
0
Order By: Relevance
“…The first group refers to GC content features, these features are classic ones in gene prediction. The second group refers to k-mer features, these features are widely used in other branches of Bioinformatics such as assembly [ 25 ] and binning [ 30 ], but still little explored in gene prediction problems.
Fig.
…”
Section: Methodsmentioning
confidence: 99%
“…The first group refers to GC content features, these features are classic ones in gene prediction. The second group refers to k-mer features, these features are widely used in other branches of Bioinformatics such as assembly [ 25 ] and binning [ 30 ], but still little explored in gene prediction problems.
Fig.
…”
Section: Methodsmentioning
confidence: 99%
“…Short k -mer ( k < 15) based measures, such as , and CVtree , calculate dissimilarity between sequences or high-throughput sequencing samples ( Jiang et al, 2012 ; Liao et al, 2016 ; Song et al, 2019 ) using the global statistical models. Based on long k -mers ( k > 21), Mash ( Ondov et al, 2016 ), Skmer ( Sarmashghi et al, 2019 ), and Kmer-db ( Deorowicz et al, 2018 ) use MinHash to approximate Jaccard distance between pairwise sequences based on randomly sampled small set of k -mers.…”
Section: Methodsmentioning
confidence: 99%
“…The first group refers to GC content features, these features are classic ones in gene prediction, being used in tools such as FragGeneScan, Prodigal and Orphelia. The second group refers to k-mer features, these features are widely used in other branches of Bioinformatics such as assembly [30] and binning [31], but still little explored in gene prediction problems. The feature importance index was calculated according to importance method of the Caret package [32] and, as Figure 3 presents the sequence length as the most important one, followed by k-mer features, having more than 80% importance index.…”
Section: Feature Engineeringmentioning
confidence: 99%