ProteinBERT: A universal deep-learning model of protein sequence and function

Brandes, Nadav; Ofer, Dan; Peleg, Yam; Rappoport, Nadav; Linial, Michal

doi:10.1101/2021.05.24.445464

Cited by 48 publications

(74 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another viable option are the recurrent and attention-based neural networks, which have enough computational power to describe relevant dependencies in protein sequences [108, 109, 110]. However, while modern neural networks have been successfully applied to annotation of protein families [111, 112], their performance in modeling short protein sequence fragments is yet too be evaluated.…”

Section: Discussionmentioning

confidence: 99%

Exploring a diverse world of effector domains and amyloid signaling motifs in fungal NLR proteins

Wojciechowski

Tekoglu

Gąsior-Głogowska

et al. 2022

Preprint

View full text Add to dashboard Cite

NLR proteins are intracellular receptors constituting a conserved component of the innate immune system of multicellular organisms. In fungi, NLRs are characterized by high diversity of architectures and presence of amyloid signaling. Here, we explore the diverse world of effector and signaling domains of fungal NLRs using state-of-the-art bioinformatic methods including MMseqs2 for fast clustering, probabilistic context-free grammars for sequence analysis, and AlphaFold2 deep neural networks for structure prediction. In addition to substantially improving the overall annotation, especially in basidiomycetes, the study identifies novel domains and reveals the structural similarity of MLKL-related HeLo- and Goodbye-like domains forming the most abundant superfamily of fungal NLR effectors. Moreover, compared to previous studies, we found several times more amyloid motifs, including novel families, and validated aggregating and prion-forming properties of the most abundant of them in vitro and in vivo. Also, through an extensive in silico search, the NLR-associated amyloid signaling is for the first time identified in basidiomycetes. The emerging picture highlights similarities and differences in the NLR architectures and amyloid signaling in ascomycetes, basidiomycetes and other branches of life.

show abstract

Section: Discussionmentioning

confidence: 99%

Exploring a diverse world of effector domains and amyloid signaling motifs in fungal NLR proteins

Wojciechowski

Tekoglu

Gąsior-Głogowska

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Among them, UDSMProt 60 , a LSTM sequence model trained on unlabeled Swiss-Prot protein sequences in a self-supervised autoregressive manner has shown remarkable performance on protein-level classification tasks after fine tuning. Another convolutional transformation and attention-based model ProteinBERT 61 , pre-trained on sequence-correction and GO annotation prediction tasks, has shown impressive performance on protein-level regression tasks after fine tuning. We want to explore the possibility of combining ACP-MHCNN for fine tuning these pre-trained models for ACP identification in future work.…”

Section: Discussionmentioning

confidence: 99%

ACP-MHCNN: an accurate multi-headed deep-convolutional neural network to predict anticancer peptides

Ahmed

Muhammod

Khan

et al. 2021

Sci Rep

View full text Add to dashboard Cite

Although advancing the therapeutic alternatives for treating deadly cancers has gained much attention globally, still the primary methods such as chemotherapy have significant downsides and low specificity. Most recently, Anticancer peptides (ACPs) have emerged as a potential alternative to therapeutic alternatives with much fewer negative side-effects. However, the identification of ACPs through wet-lab experiments is expensive and time-consuming. Hence, computational methods have emerged as viable alternatives. During the past few years, several computational ACP identification techniques using hand-engineered features have been proposed to solve this problem. In this study, we propose a new multi headed deep convolutional neural network model called ACP-MHCNN, for extracting and combining discriminative features from different information sources in an interactive way. Our model extracts sequence, physicochemical, and evolutionary based features for ACP identification using different numerical peptide representations while restraining parameter overhead. It is evident through rigorous experiments using cross-validation and independent-dataset that ACP-MHCNN outperforms other models for anticancer peptide identification by a substantial margin on our employed benchmarks. ACP-MHCNN outperforms state-of-the-art model by 6.3%, 8.6%, 3.7%, 4.0%, and 0.20 in terms of accuracy, sensitivity, specificity, precision, and MCC respectively. ACP-MHCNN and its relevant codes and datasets are publicly available at: https://github.com/mrzResearchArena/Anticancer-Peptides-CNN. ACP-MHCNN is also publicly available as an online predictor at: https://anticancer.pythonanywhere.com/.

show abstract

“…Second, we use UniRef50 ( 62) clustering to split the data, to model a challenging use-case in which an unseen sequence has low sequence similarity to anything that has been previously annotated. Note there are alternative methods for splitting (48,63,64), such as reserving the most recentlyannotated proteins for evaluating models. This approach, which is used in CAFA and CASP (63,64), helps ensure a fair competition because labels for the evaluation data are not available to participants, or the scientific community at large, until after the competition submissions are due.…”

Section: A Machine-learning Compatible Dataset For Protein Function Predictionmentioning

confidence: 99%

“…Beyond functional annotation, deep learning has enabled significant advances in protein structure prediction (31)(32)(33)(34)(35)(36), predicting the functional effects of mutations (37)(38)(39)(40), and protein design (41)(42)(43)(44)(45)(46)(47). A key departure from traditional approaches is that researchers have started to incorporate vast amounts of raw, uncurated sequence data into model training, an approach which also shows promise for functional prediction (48). Of particular relevance to the present work is Bileschi et al (2019) (49), where it is shown that models with residual layers (50) of dilated convolutions (51) can precisely and efficiently categorise protein domains.…”

Section: Introductionmentioning

confidence: 99%

ProteInfer: deep networks for protein functional inference

Sanderson

Bileschi

Belanger

et al. 2021

Preprint

View full text Add to dashboard Cite

Predicting the function of a protein from its amino acid sequence is a long-standing challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we instead employ deep convolutional neural networks to directly predict a variety of protein functions -- EC numbers and GO terms -- directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignment-based methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an in-browser graphical interface for protein function prediction in which all computation is performed on the user's personal computer with no data uploaded to remote servers. Moreover, these models place full-length amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, visit https://google-research.github.io/proteinfer/

show abstract

ProteinBERT: A universal deep-learning model of protein sequence and function

Cited by 48 publications

References 33 publications

Exploring a diverse world of effector domains and amyloid signaling motifs in fungal NLR proteins

Exploring a diverse world of effector domains and amyloid signaling motifs in fungal NLR proteins

ACP-MHCNN: an accurate multi-headed deep-convolutional neural network to predict anticancer peptides

ProteInfer: deep networks for protein functional inference

Contact Info

Product

Resources

About