ISPRED-SEQ: Deep Neural Networks and Embeddings for Predicting Interaction Sites in Protein Sequences

Manfredi, Matteo; Savojardo, Castrense; Martelli, Pier Luigi; Casadio, Rita

doi:10.1016/j.jmb.2023.167963

Cited by 10 publications

(15 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Residue-level representations are then concatenated together, leading to vectors of 2304 dimensions for each residue in the sequences. The concatenation of embeddings obtained with different pLMs has been shown to improve the performance in previous works (Manfredi et al, 2022(Manfredi et al, , 2023.…”

Section: Protein Encodingmentioning

confidence: 95%

CoCoNat: a novel method based on deep-learning for coiled-coil prediction

Madeo

Savojardo

Martelli

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Coiled-coil domains (CCD) are widespread in all organisms performing several crucial functions. Given their relevance, the computational detection of coiled-coil domains is very important for protein functional annotation. State-of-the art prediction methods include the precise identification of coiled-coil domain boundaries, the annotation of the typical heptad repeat pattern along the coiled-coil helices as well as the prediction of the oligomerization state. In this paper we describe CoCoNat, a novel method for predicting coiled-coil helix boundaries, residue-level regis-ter annotation and oligomerization state. Our method encodes sequences with the combination of two state-of-the-art pro-tein language models and implements a three-step deep learning procedure concatenated with a Grammatical-Restrained Hidden Conditional Random Field (GRHCRF) for CCD identification and refinement. A final neural network (NN) predicts the oligomerization state. When tested on a blind test set routinely adopted, CoCoNat obtains a performance superior to the current state-of-the-art both for residue-level and segment-level coiled-coil detection. CoCoNat significantly outperforms the most recent state-of-the art method on register annotation and prediction of oligomerization states. CoCoNat is available at https://coconat.biocomp.unibo.it

show abstract

Section: Protein Encodingmentioning

confidence: 95%

CoCoNat: a novel method based on deep-learning for coiled-coil prediction

Madeo

Savojardo

Martelli

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The next strategy for protein sequence representation is the use of embeddings and encoders. Encoders such as one hot encodings can be used to represent protein sequences, while embeddings such as Word2Vec, FastText, and BERT [7,34,43,44,48,67,88] are commonly used to transform the protein sequence into an interpretable feature for a neural network. The main advantage to this strategy is that it keeps some resemblance of the entire protein sequence while also providing an interpretable feature representation for a neural network [43].…”

Section: Embeddingsmentioning

confidence: 99%

“…However, models that predict local interaction tend to use a combination of statistical, physiochemical representations as well as some representation of the overall protein sequence that captures local features of the protein (e.g., overall fold or domains). As described in the previous sections, protein sequence representations encompass encoding methods such as metric representations, text embeddings, and neural network feature embeddings, but some groups have also leveraged raw protein sequences [39,42,44,67,86,87]. Using unprocessed protein sequences for PPI prediction creates an issue for neural network architectures since most models depend on an input of fixed length.…”

Section: Raw Protein Sequencesmentioning

confidence: 99%

“…This issue means that the architectures used in these models must either use RNNs, which can handle variable length inputs or else somehow fix the input lengths of a given protein sequence. Given the limitations to using RNNs noted before, efforts in this area opt to fix the input length of the proteins using neural network embedding models, text embeddings, or a pre-determined length [39,44,87]. Overall, however, these strategies have seen less success than approaches that alter the protein sequence.…”

Section: Raw Protein Sequencesmentioning

confidence: 99%

“…Metric representations involve distilling the raw protein sequence into statistical, physical, or chemical properties and seems to be the most popular method in the field currently. These representations include properties such as hydrophobicity, secondary structures, backbone angles, amino acid composition (AAC), and many other features [11, 20, 28, 33, 34, 42–44, 80, 84–87]. Other commonly used features including AAC and conjoint triad (CT) provide a global representation of the protein, while also fixing the length of the input feature [4, 34].…”

Section: Protein Sequences For Ppi Predictionsmentioning

confidence: 99%

See 2 more Smart Citations

State‐of‐the‐art computational methods to predict protein–protein interactions with high accuracy and coverage

Kewalramani

Emili²,

Crovella

2023

Proteomics

View full text Add to dashboard Cite

Prediction of protein–protein interactions (PPIs) commonly involves a significant computational component. Rapid recent advances in the power of computational methods for protein interaction prediction motivate a review of the state‐of‐the‐art. We review the major approaches, organized according to the primary source of data utilized: protein sequence, protein structure, and protein co‐abundance. The advent of deep learning (DL) has brought with it significant advances in interaction prediction, and we show how DL is used for each source data type. We review the literature taxonomically, present example case studies in each category, and conclude with observations about the strengths and weaknesses of machine learning methods in the context of the principal sources of data for protein interaction prediction.

show abstract