2020
DOI: 10.1101/2020.06.30.180448
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Unifying the known and unknown microbial coding sequence space

Abstract: AbstractBridging the gap between the known and the unknown coding sequence space is one of the biggest challenges in molecular biology today. This challenge is especially extreme in microbiome analyses where between 40% to 60% of the coding sequences detected are of unknown function, and ignoring this fraction limits our understanding of microbial systems. Discarding the uncharacterized fraction is not an option anymore. Here, we present an in-depth exploration of the microbial… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
35
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 26 publications
(37 citation statements)
references
References 127 publications
2
35
0
Order By: Relevance
“…Our current lack of understanding of many eukaryotic functional genes even within the scope of model organisms 58 can explain the limits of referencebased approaches to study the gene content of eukaryotic plankton. Thus, to gain further insights and overcome these limitations, we partitioned and categorized the eukaryotic gene content with AGNOSTOS 59 . AGNOSTOS grouped 5.4 million genes in 424,837 groups of genes sharing remote homologies, adding 2.3 million genes left uncharacterized by the EggNOG annotation.…”
Section: A Complex Interplay Between the Evolution And Functioning Ofmentioning
confidence: 99%
See 3 more Smart Citations
“…Our current lack of understanding of many eukaryotic functional genes even within the scope of model organisms 58 can explain the limits of referencebased approaches to study the gene content of eukaryotic plankton. Thus, to gain further insights and overcome these limitations, we partitioned and categorized the eukaryotic gene content with AGNOSTOS 59 . AGNOSTOS grouped 5.4 million genes in 424,837 groups of genes sharing remote homologies, adding 2.3 million genes left uncharacterized by the EggNOG annotation.…”
Section: A Complex Interplay Between the Evolution And Functioning Ofmentioning
confidence: 99%
“…Eukaryotic SMAGs integration in the AGNOSTOS-DB. We used the AGNOSTOS workflow to integrate the protein coding genes predicted from the SMAG into a variant of the AGNOSTOS-DB that contains 1,829 metagenomes from the marine and human microbiomes, 28,941 archaeal and bacterial genomes from the Genome Taxonomy Database (GTDB) and 3,243 nucleocytoplasmic large DNA viruses (NCLDV) metagenome assembled genomes (MAGs) 59 .…”
Section: Biogeography Of Smagsmentioning
confidence: 99%
See 2 more Smart Citations
“…Unfortunately, over half of all proteins do not have detectable homologs in standard sequence databases due to their distant evolutionary relationships [13]. Detecting these remote homologs would help us better understand mutagenesis [14], aid protein design [15], predict protein function [16], predict protein structure [17,18,19] and model evolution [20].…”
Section: Introductionmentioning
confidence: 99%