2015
DOI: 10.1093/bioinformatics/btv345
|View full text |Cite
|
Sign up to set email alerts
|

ProFET: Feature engineering captures high-level protein functions

Abstract: Supplementary data are available at Bioinformatics online.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
58
0
1

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
2
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 93 publications
(60 citation statements)
references
References 65 publications
0
58
0
1
Order By: Relevance
“…For comparison purposes under difficult working conditions with limited or completely missing homology information, additional predictions were generated by a baseline method (Naïve), which ranks GO terms by prevalence in UniProtKB-GOA, and by a sequence similarity-based approach (BLAST), which can transfer annotations only from distantly related and experimentally characterized proteins as detailed in Methods. Other machine-learning based tools for GO term prediction from patterns of biological features could not be included in the study: ProtFun15 has not been updated in a very long time and only covers a handful of currently valid GO terms, whereas ProFET18 requires training from scratch classifiers for all GO categories of interest.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…For comparison purposes under difficult working conditions with limited or completely missing homology information, additional predictions were generated by a baseline method (Naïve), which ranks GO terms by prevalence in UniProtKB-GOA, and by a sequence similarity-based approach (BLAST), which can transfer annotations only from distantly related and experimentally characterized proteins as detailed in Methods. Other machine-learning based tools for GO term prediction from patterns of biological features could not be included in the study: ProtFun15 has not been updated in a very long time and only covers a handful of currently valid GO terms, whereas ProFET18 requires training from scratch classifiers for all GO categories of interest.…”
Section: Resultsmentioning
confidence: 99%
“…The observation that the length and position of intrinsically disordered protein regions strongly correlates with some molecular activities and biological processes led to an expanded set of sequence-derived features, which FFPred scans through a library of GO term-specific SVMs to annotate protein chains1617. A more recent study has confirmed the effectiveness of this feature-based approach with the use of random forests for supervised learning18.…”
mentioning
confidence: 97%
“…In addition to homology, there exist many AFP methods that exploit additional information extracted from the genome sequence, e.g., conserved gene neighborhoods (Ling et al, 2009), phylogenetic distribution (Pellegrini et al, 1999), protein motifs and biophysical properties (Ofer and Linial, 2015), codon usage biases (Kriško et al, 2014), remote homology (Hawkins et al, 2009;Sokolov and Ben-Hur, 2010), and composition of protein domains (Hunter et al, 2011;Punta et al, 2011). Moreover, inference using genomic information can be further supplemented by experimental data: gene expression (Tian et al, 2008), protein-protein interactions (Cao and Cheng, 2015) or protein structure (Wass et al, 2012), and also by text-mining the scientific literature .…”
Section: Introductionmentioning
confidence: 99%
“…The main features included are: i) the location of the variant within the protein sequence, ii) the identities of the reference and alternative amino-acids, iii) the score of the amino-acid substitution under various BLOSUM matrices, iv) an abundance of annotations extracted from UniProt, v) amino-acid scales (i.e. various numeric values assigned to amino-acids, as described elsewhere 71,72 ), vi) Pfam domains and Pfam clans. The full specification of all extracted features is available in Supplementary Table S4.…”
Section: Proteomic Features Used By the Prediction Modelmentioning
confidence: 99%