2019
DOI: 10.1101/812917
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome

Abstract: Classifying taxa, including those that have not previously been identified, is a key task in characterizing the microbial communities of under-described habitats, including permanently ice-covered lakes in the dry valleys of the Antarctic. Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from such habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, "Genome Constellation", that is cap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1
1
1

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 55 publications
0
4
0
Order By: Relevance
“…This dataset has a total uncompressed size of 164.8GB. The Antarctic Lake Metagenome Dataset was downloaded from the JGI GOLD database with the accession no Gs0118069 (https://gold.jgi.doe.gov/study?id=Gs0118069). It was derived from samples taken from two meromictic lakes in Antartica (Wang et al, 2019). This dataset has twelve files totalling 1.37TB.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This dataset has a total uncompressed size of 164.8GB. The Antarctic Lake Metagenome Dataset was downloaded from the JGI GOLD database with the accession no Gs0118069 (https://gold.jgi.doe.gov/study?id=Gs0118069). It was derived from samples taken from two meromictic lakes in Antartica (Wang et al, 2019). This dataset has twelve files totalling 1.37TB.…”
Section: Methodsmentioning
confidence: 99%
“…meta-RAY (Boisvert et al, 2012) uses MPI to distribute large metagenome assembly to multiple computer nodes. To overcome its limitation that it only assembles very abundant species, hybrid strategies have been developed to first use meta-RAY in a computer cluster to assemble abundant species (which often comprise most of the sequencing data), followed by MEGAHIT or metaSPAdes in a single node to assemble unassembled reads (Wang et al, 2019). Recently, MetaHipMer used UPC++ to assemble very large metagenome datasets with high accuracy and efficiency (Hofmeyr et al, 2020), but it runs best on a supercomputer that is not readily available to most researchers.…”
Section: Introductionmentioning
confidence: 99%
“…id= Gs011 8069). It was derived from samples taken from two meromictic lakes in Antartica [24]. This dataset has twelve files totalling 1.37 TB.…”
Section: Availability Of Data and Materialsmentioning
confidence: 99%
“…meta-RAY [3] uses MPI to distribute a large metagenome assembly to multiple computer nodes. To overcome its limitation that it only assembles very abundant species, hybrid strategies have been developed to first use meta-RAY in a computer cluster to assemble abundant species (which often comprise most of the sequencing data), followed by MEGAHIT or metaS-PAdes in a single node to assemble unassembled reads [24]. Recently, MetaHipMer used UPC++ to assemble very large metagenome datasets with high accuracy and efficiency [6], but it runs best on a supercomputer that is not readily available to most researchers.…”
mentioning
confidence: 99%