2021
DOI: 10.1038/s41587-020-00777-4
|View full text |Cite
|
Sign up to set email alerts
|

Improved metagenome binning and assembly using deep variational autoencoders

Abstract: Identification and reconstruction of microbial species from metagenomics wide genome sequencing data is an important and challenging task. Current existing approaches rely on gene or contig co-abundance information across multiple samples and k -mer composition information in the sequences. Here we use recent advances in deep learning to develop an algorithm that uses variational autoencoders to encode co-abundance and compositional information prior to clustering. We show that the deep network is able to inte… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
365
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 296 publications
(370 citation statements)
references
References 81 publications
1
365
1
Order By: Relevance
“…In the following, we compared the performance of Metabinner with other individual binners (CONCOCT [12], MetaBAT [34,13], MaxBin [14,15], VAMB [16]) and ensemble binners (BMC3C [22], MetaWRAP [21] and DAS Tool [19]). Then, we conducted experiments to show the necessity and effectiveness of multiple features and initializations.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the following, we compared the performance of Metabinner with other individual binners (CONCOCT [12], MetaBAT [34,13], MaxBin [14,15], VAMB [16]) and ensemble binners (BMC3C [22], MetaWRAP [21] and DAS Tool [19]). Then, we conducted experiments to show the necessity and effectiveness of multiple features and initializations.…”
Section: Resultsmentioning
confidence: 99%
“…MaxBin [14,15] multiplies the probability P dist and the probability P cov that a sequence belongs to a bin based on nucleotide frequency distance and coverage, respectively. A deep learning-based binner, VAMB [16], has recently been developed, which utilizes variational autoencoders (VAE) [17] to convert nucleotide information and coverage information for binning. VAMB then clusters the transformed data using an adaptive iterative medoid method.…”
Section: Introductionmentioning
confidence: 99%
“…Combining long-reads with the approach presented here will allow genotyping of complex ecMGEs, such as MGEs that contain insertion sequences (typically <2.5kb) and short transposons. Longer repeat elements and complex rearrangements that require bridging over more than ~10kb can be addressed with additional experimental work, such as Hi- 28,[36][37][38] and by sampling the same community multiple times 39,40 .…”
Section: Discussionmentioning
confidence: 99%
“…To estimate the main phyla present in the microbiome samples, a taxonomic affiliation was determined using Kaiju 25 for the top 10 585 P contigs with a global abundance above 10 RPKM. To sum up contig abundances by phyla, those which might belong to the same microbial species were binned into clusters based on kmer composition and co-occurrence with VAMB 26 . A single contig per VAMB cluster was conserved in the abundance matrix, which ended up containing 5862 contigs (~species).…”
Section: Microbiota Read Cleaning Assembly and Contig Treatments To Constitute The P Reference Datasetmentioning
confidence: 99%