2020
DOI: 10.1101/2020.04.03.023523
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PhANNs, a fast and accurate tool and web server to classify phage structural proteins

Abstract: For any given bacteriophage genome or phage sequences in metagenomic data sets, we are unable to assign a function to 50-90% of genes. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-toidentify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to cla… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 32 publications
(21 reference statements)
0
8
0
Order By: Relevance
“…Within the 24 main target PCs, a majority of sequences (71%) did not display any significant sequence similarity to any known protein domain outside of the C-terminal VR region, even when using highly-sensitive annotation tools such as HHblits 23 . Hence, we instead classified targets into broad functional classes, namely “structural proteins” vs “unknown” for viral-encoded DGRs and “membrane-bound” proteins vs “unknown” for cellular-encoded DGRs, using non-similarity-based protein annotation approaches 24 , 25 (Supplementary Note 8 ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Within the 24 main target PCs, a majority of sequences (71%) did not display any significant sequence similarity to any known protein domain outside of the C-terminal VR region, even when using highly-sensitive annotation tools such as HHblits 23 . Hence, we instead classified targets into broad functional classes, namely “structural proteins” vs “unknown” for viral-encoded DGRs and “membrane-bound” proteins vs “unknown” for cellular-encoded DGRs, using non-similarity-based protein annotation approaches 24 , 25 (Supplementary Note 8 ).…”
Section: Resultsmentioning
confidence: 99%
“…Annotations were derived from hits with a score of ≥50 in hmmsearch or ≥90% probability in hhblits, except for hits overlapping the prediction VR region for which these cutoffs were lowered to ≥30 on score and ≥80% probability, in order to enable the identification of distantly related C-lectin folds. In addition, individual target sequences were also searched for transmembrane domains and signal peptides using TMHMM 63 v2.0c (default parameters) and SignalP 64 v4.1 (score D ≥ Dmaxcut), and searched for potential Caudovirales structural proteins (capsid or tail proteins) using DeepCapTail v3038c4d 24 (version downloaded Jan. 2020) and PhANNs v1.0.0 25 with thresholds of ≥0.9 and ≥0.2 on the score, respectively. The same clustering and annotation pipeline were applied to predicted cds from NCBI RefSeq Caudovirales genomes, after having dereplicated these protein sequences at 99% using cd-hit 44 v4.8.1 ( n = 250,209 proteins), in order to evaluate the functional annotation across all Caudovirales of (i) sequences containing an Ig-like domain (ii) sequences predicted as “capsid” or “tail” via DeepCapTail (see Supplementary Note 8 ).…”
Section: Methodsmentioning
confidence: 99%
“…Diversity generating retroelements were identified in members of cluster1819 by using both the myDGR web server [67] and MetaCSST(v1.0) [68] tools. To predict whether the target genes identified were putative tail fibre genes, as has previously been suggested [69], the proteins from all members were analysed with PhANNs(v1.0.0) [70] and the most significant hit to a tail fibre gene was carried forward. These were then compared with the results from the previous tools.…”
Section: Identification Of Diversity Generating Retroelements In the Most Abundant Prophage Clustermentioning
confidence: 99%
“…However, among the limitations of our findings is the use of database-dependent analyses. In the future, tools such as artificial neural networks (Mendez et al, 2019;Cantu et al, 2020) can be used in bioinformatic and chemoinformatic approaches to shed light on sequence and mass-spectrometry data that is absent in repositories. Furthermore, less invasive sampling techniques could be used to capture molecular dynamics over relevant temporal scales (e.g., before and after stress events).…”
Section: Current Limitations and Future Directionsmentioning
confidence: 99%