A Comparative Analysis of Allergen Proteins between Plants and Animals Using Several Computational Tools and Chou’s PseAAC Concept

Behbahani, Mandana; Rabiei, Parisa; Mohabatkar, Hassan

doi:10.1159/000509084

Cited by 7 publications

(5 citation statements)

References 49 publications

(54 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Motifs are basically signature sequences that aid in the identification of any protein. The e-value shows accuracy of the predicted motif; less the e-value, more the precision of the possible motifs [ 56 ]. From our study, it has been found that the GTs retrieved from all the three different environments based on growth temperatures (i.e., mesophile, thermophile, and hyperthermophile), e-value were less than 3.0e + 000.…”

Section: Resultsmentioning

confidence: 99%

Computational Study on Temperature Driven Structure–Function Relationship of Polysaccharide Producing Bacterial Glycosyl Transferase Enzyme

et al. 2021

View full text Add to dashboard Cite

Glycosyltransferase (GTs) is a wide class of enzymes that transfer sugar moiety, playing a key role in the synthesis of bacterial exopolysaccharide (EPS) biopolymer. In recent years, increased demand for bacterial EPSs has been observed in pharmaceutical, food, and other industries. The application of the EPSs largely depends upon their thermal stability, as any industrial application is mainly reliant on slow thermal degradation. Keeping this in context, EPS producing GT enzymes from three different bacterial sources based on growth temperature (mesophile, thermophile, and hyperthermophile) are considered for in silico analysis of the structural–functional relationship. From the present study, it was observed that the structural integrity of GT increases significantly from mesophile to thermophile to hyperthermophile. In contrast, the structural plasticity runs in an opposite direction towards mesophile. This interesting temperature-dependent structural property has directed the GT–UDP-glucose interactions in a way that thermophile has finally demonstrated better binding affinity (−5.57 to −10.70) with an increased number of hydrogen bonds (355) and stabilizing amino acids (Phe, Ala, Glu, Tyr, and Ser). The results from this study may direct utilization of thermophile-origin GT as best for industrial-level bacterial polysaccharide production.

show abstract

Section: Resultsmentioning

confidence: 99%

Computational Study on Temperature Driven Structure–Function Relationship of Polysaccharide Producing Bacterial Glycosyl Transferase Enzyme

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Therefore, more attention has recently been given to bioinformatics and machine learning strategies as potential tools for detecting and classifying food allergens. Among the great variety of methods, intelligence neural networks, supervised learning, support vector machines with linear kernel functions, and different classifiers such as k -nearest neighbor are used as reliable options for identifying, modeling, and predicting allergenic properties. − Wang et al developed a new deep learning model (transformer with a self-attention mechanism combining the learning models Light Gradient Boosting Machine [LightGBM] and eXtreme Gradient Boosting [XGBoost]) for the prediction of food allergens. Machine learning is proving to be a tremendously helpful solution in this field.…”

Section: Introductionmentioning

confidence: 99%

Sequence-Based Prediction of Plant Allergenic Proteins: Machine Learning Classification Approach

et al. 2023

View full text Add to dashboard Cite

This Article proposes a novel chemometric approach to understanding and exploring the allergenic nature of food proteins. Using machine learning methods (supervised and unsupervised), this work aims to predict the allergenicity of plant proteins. The strategy is based on scoring descriptors and testing their classification performance. Partitioning was based on support vector machines (SVM), and a k-nearest neighbor (KNN) classifier was applied. A fivefold cross-validation approach was used to validate the KNN classifier in the variable selection step as well as the final classifier. To overcome the problem of food allergies, a robust and efficient method for protein classification is needed.

show abstract

“…Subsequently, various approaches are proposed from different perspectives for allergen prediction, such as motif-based approaches, similarity searches, machine learning-based modeling, etc. Prediction tools, such as AllerCatPro 2.0, AlgPred 2.0, have been used extensively for preliminary allergenic risk assessment and identification of new allergens. ,,− One representative project derived from the FAO/WHO guidelines was AllerCatPro/AllerCatPro 2.0, which implemented a hierarchical workflow employing five criteria with improved similarity-checking methods for both sequences and 3D structures. , The criteria used in AllerCatPro were based on statistical analysis of existing allergens, representing a form of experience/knowledge-based decision-making. For instance, if a protein sequence shares >35% identity with known allergens over 90 windows, it would be classified into the group with strong evidence for allergenic potential. , So far, this approach has achieved a great performance in the allergen benchmark data sets .…”

Section: Introductionmentioning

confidence: 99%

“…Another representative technique is the machine learning (ML) approach where the classification criterion is determined by the ML model by learning from a separate training data set. ,,,− , Besides the selection of modeling method, the most challenging task in the ML approach is the representation or encoding of protein/peptide sequences into a numerical vector/matrix. Various encoding methods, including amino acid descriptors, amino acid composition (AAC), pseudoamino acid composition (PseAAC), dipeptide composition (DPC), amino acid descriptors (AAD), position-specific scoring matrix (PSSM), physicochemical descriptors, biomedical properties, k-mer dictionary-based binary representation, etc., have been widely used in predicting allergenicity and other properties/bioactivities. ,,,,,− However, these features may not always accurately represent protein sequences and simple combinations can cause high-dimensional problems as well as the feature redundancy …”

Section: Introductionmentioning

confidence: 99%

pLM4Alg: Protein Language Model-Based Predictors for Allergenic Proteins and Peptides

Du,

Xu,

Liu

et al. 2023

J. Agric. Food Chem.

View full text Add to dashboard Cite

The rising prevalence of allergy demands efficient and accurate bioinformatic tools to expedite allergen identification and risk assessment while also reducing wet experiment expenses and time. Recently, pretrained protein language models (pLMs) have successfully predicted protein structure and function. However, to our best knowledge, they have not been used for predicting allergenic proteins/peptides. Therefore, this study aims to develop robust models for allergenic protein/peptide prediction using five pLMs of varying sizes and systematically assess their performance through fine-tuning with a convolutional neural network. The developed pLM4Alg models have achieved state-of-the-art performance with accuracy, Matthews correlation coefficient, and area under the curve scoring 93.4−95.1%, 0.869−0.902, and 0.981−0.990, respectively. Moreover, pLM4Alg is the first model capable of handling prediction tasks involving residue-missed sequences and sequences containing nonstandard amino acid residues. To facilitate easy access, a user-friendly web server (https://f6wxpfd3sh.us-east-1.awsapprunner.com) has been established. pLM4Alg is expected to become the leading machine learning-based prediction model for allergenic peptides and proteins. Its collaboration with other predictors holds great promise for accelerating allergy research.

show abstract

A Comparative Analysis of Allergen Proteins between Plants and Animals Using Several Computational Tools and Chou’s PseAAC Concept

Cited by 7 publications

References 49 publications

Computational Study on Temperature Driven Structure–Function Relationship of Polysaccharide Producing Bacterial Glycosyl Transferase Enzyme

Computational Study on Temperature Driven Structure–Function Relationship of Polysaccharide Producing Bacterial Glycosyl Transferase Enzyme

Sequence-Based Prediction of Plant Allergenic Proteins: Machine Learning Classification Approach

pLM4Alg: Protein Language Model-Based Predictors for Allergenic Proteins and Peptides

Contact Info

Product

Resources

About