2023
DOI: 10.1093/sysbio/syad037
|View full text |Cite
|
Sign up to set email alerts
|

A k-mer-Based Approach for Phylogenetic Classification of Taxa in Environmental Genomic Data

Abstract: In the age of genome sequencing, whole genome data is readily and frequently generated, leading to a wealth of new information that can be used to advance various fields of research. New approaches, such as alignment-free phylogenetic methods that utilize k-mer-based distance scoring, are becoming increasingly popular given their ability to rapidly generate phylogenetic information from whole genome data. However, these methods have not yet been tested using environmental data, which often tends to be highly f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 81 publications
0
4
0
Order By: Relevance
“…Many authors (cf. Edwards et al 2002;Chan et al 2014;Balaban et al 2022;Van Etten et al 2023;Aledo 2022;Dylus et al 2023) declared that k-mers from unaligned data deliver good phylogenetic information, but our analysis shows that k-mers derived from unaligned sequences contain less information than sequences augmented with alignment information. We agree that unaligned data analysis is a great first and quick step to evaluate phylogenetic relationships, but if great accuracy is needed, then aligned sequences will be necessary.…”
Section: Discussionmentioning
confidence: 55%
See 1 more Smart Citation
“…Many authors (cf. Edwards et al 2002;Chan et al 2014;Balaban et al 2022;Van Etten et al 2023;Aledo 2022;Dylus et al 2023) declared that k-mers from unaligned data deliver good phylogenetic information, but our analysis shows that k-mers derived from unaligned sequences contain less information than sequences augmented with alignment information. We agree that unaligned data analysis is a great first and quick step to evaluate phylogenetic relationships, but if great accuracy is needed, then aligned sequences will be necessary.…”
Section: Discussionmentioning
confidence: 55%
“…Alignment-free methods for evolutionary analysis have been reviewed (Vinga and Almeida 2003; Zielezinski et al 2019) and their robustness investigated (Chan et al 2014; Bernard et al 2016). For example, they can be used to derive distances to be summarized into phylogenies (Edwards et al 2002; Chan et al 2014; Balaban et al 2022; Van Etten et al 2023). Alignment-free approaches for phylogeny inference have attracted considerable attention in recent years (Dylus et al 2023; Aledo 2022; Bernard et al 2021; Zielezinski et al 2017).…”
Section: Introductionmentioning
confidence: 99%
“…These algae are also evolutionarily distinct, having diverged shortly after the emergence of the Galdieria clade approximately 1 billion years ago (Figure A20B; Yoon et al, 2017). Furthermore, each isolate represents the earliest divergence within its respective clade (Rossoni et al, 2019; Van Etten, Stephens, & Bhattacharya, 2023). Despite having indistinguishable morphology and nearly identical genome size and features (e.g., few introns, sparse non‐coding DNA, similar GC content; Rossoni et al, 2019), these divergent organisms (recently established as different species; Park et al, 2023) have likely resided on opposite sides of our planet for hundreds of millions of years, possibly since the breakup of the last supercontinent, Pangaea (Correia & Murphy, 2020).…”
Section: Discussionmentioning
confidence: 99%
“…In the face of intricate and varied genomic data, k-mer taxonomic classification techniques have evolved into an essential instrument for researchers. Their high efficiency and generality have led to their extensive application in diverse settings and fields, ranging from environmental genomics to metagenomics, epidemiology, and origin tracking, to name a few ( Van Etten et al 2023 ). More importantly, k-mer methods do not rely on specific genes or genomic structures, which is particularly beneficial for genomes without detailed annotations or those that are highly heterogeneous.…”
Section: Introductionmentioning
confidence: 99%