Multi-function Prediction of Unknown Protein Sequences Using Multilabel Classifiers and Augmented Sequence Features

Agrawal, Saurabh; Sisodia, Dilip Singh; Nagwani, Naresh Kumar

doi:10.1007/s40995-021-01134-z

Cited by 3 publications

(2 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, these embedded vectors generate a per-molecular representation for a substrate. Furthermore, the concatenated representation vector of protein and substrate are imputed into the explainable extra tree model, which has been clarified to be useful on various protein or peptide function prediction tasks [19][20]. We also found that a solely concatenated representation vector cannot discriminate high or low k cat values well by projection of t-distributed stochastic neighbour embedding (t-SNE) [21], further demonstrating the necessity of the machine learning model (Supplementary Fig.…”

Section: Overview Of Prekcatmentioning

confidence: 88%

See 1 more Smart Citation

Pretrained language models and weight redistribution achieve precisek_catprediction

Luo

2022

Preprint

View full text Add to dashboard Cite

The enzyme turnover number (kcat) is a meaningful and valuable kinetic parameter, reflecting the catalytic efficiency of an enzyme to a specific substrate, which determines the global proteome allocation, metabolic fluxes and cell growth. Here, we present a precisekcatprediction model (PreKcat) leveraging pretrained language models and a weight redistribution strategy. PreKcat significantly outperforms the previouskcatprediction method in terms of various evaluation metrics. We also confirmed the ability of PreKcat to discriminate enzymes of different metabolic contexts and different types. Additionally, the proposed weight redistribution strategies effectively reduce the prediction error of highkcatvalues and capture minor effects of amino acid substitutions on two crucial enzymes of the naringenin synthetic pathway, leading to obvious distinctions. Overall, the presentedkcatprediction model provides a valuable tool for deciphering the mechanisms of enzyme kinetics and enables novel insights into enzymology and biomedical applications.

show abstract

Section: Overview Of Prekcatmentioning

confidence: 88%

“…DM means that the weight of samples with k cat values higher than 5 (logarithm value) would be enhanced. We compared several parameters, including the weight multipliers (5,10,20,50) and whether they were normalized. This resulted in eight optimized model combinations.…”

Section: T-distributed Stochastic Neighbour Embedding (T-sne) Visuali...mentioning

confidence: 99%

Pretrained language models and weight redistribution achieve precisek_catprediction

Luo

2022

Preprint

View full text Add to dashboard Cite

show abstract

Long short term memory based functional characterization model for unknown protein sequences using ensemble of shallow and deep features

Agrawal

Sisodia

Nagwani

2021

Neural Comput & Applic

View full text Add to dashboard Cite

Multilevel characterization of unknown protein sequences using hierarchical long short term memory model

Agrawal,

Sisodia,

Nagwani

2024

Multimed Tools Appl

View full text Add to dashboard Cite

Multi-function Prediction of Unknown Protein Sequences Using Multilabel Classifiers and Augmented Sequence Features

Cited by 3 publications

References 51 publications

Pretrained language models and weight redistribution achieve precisek_catprediction

Pretrained language models and weight redistribution achieve precisek_catprediction

Long short term memory based functional characterization model for unknown protein sequences using ensemble of shallow and deep features

Multilevel characterization of unknown protein sequences using hierarchical long short term memory model

Contact Info

Product

Resources

About

Multi-function Prediction of Unknown Protein Sequences Using Multilabel Classifiers and Augmented Sequence Features

Cited by 3 publications

References 51 publications

Pretrained language models and weight redistribution achieve precisekcatprediction

Pretrained language models and weight redistribution achieve precisekcatprediction

Long short term memory based functional characterization model for unknown protein sequences using ensemble of shallow and deep features

Multilevel characterization of unknown protein sequences using hierarchical long short term memory model

Contact Info

Product

Resources

About

Pretrained language models and weight redistribution achieve precisek_catprediction

Pretrained language models and weight redistribution achieve precisek_catprediction