2016
DOI: 10.3389/fmicb.2016.00118
|View full text |Cite
|
Sign up to set email alerts
|

PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database

Abstract: The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approache… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
136
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 172 publications
(136 citation statements)
references
References 47 publications
0
136
0
Order By: Relevance
“…Last year, we developed a new algorithm for generating protein families that could be applied to the entire collection of PATRIC genomes (47). The method works by using the annotation process to guide family formation (4).…”
Section: What's New In Patric?mentioning
confidence: 99%
See 1 more Smart Citation
“…Last year, we developed a new algorithm for generating protein families that could be applied to the entire collection of PATRIC genomes (47). The method works by using the annotation process to guide family formation (4).…”
Section: What's New In Patric?mentioning
confidence: 99%
“…Overall, the PATtyFam algorithm is rapid and generates protein families resembling those created by alignment-based algorithms (47). When a genome is submitted to the PATRIC annotation service, protein families are automatically assigned to each protein by projection, thus enabling a user to compare their private genome with the PATRIC collection.…”
Section: What's New In Patric?mentioning
confidence: 99%
“…In order to work with clean subsets of genomes, we chose to base analyses on the proteinencoding genes that are shared among members of the same species. We used the "PATtyFam" collection, which is a set of protein families that cover the ~230,000 publicly available genomes in the PATRIC database 40 . Protein similarity for building these families is based on the RAST signature k-mer collection 37 , and all proteins must have the same annotation in order to be members of the same family.…”
Section: Core Conserved Gene Setsmentioning
confidence: 99%
“…In previous work, we observed that is possible to build accurate AMR phenotype prediction models from whole genomes without using the AMR genes 21 . In this study, in order to explore the possibility of building models from limited genome sequence data, we chose to build models from core genes that are held in common among the members of a species, and which are not annotated as having a direct role in AMR 37,40 . By being nearly universally conserved, core genes are less likely to be horizontally transferred, and are also useful for assessing genome completeness and phylogeny.…”
Section: Amr Models Based On Core Genes Have Predictive Powermentioning
confidence: 99%
“…Specifically, Inparanoid distinguishes between orthologs and in- 15 paralogs, which were duplicated following a given speciation event [4][5][6]. It is then former identify orthologs and in-paralogs using proxy methods rather than directly 41 inferring homology type from gene and species evolutionary history. In practice, 42 graph-based methods have a similar accuracy as tree-based methods [9,10,19].…”
mentioning
confidence: 99%