2023
DOI: 10.1021/acsomega.2c02842
|View full text |Cite
|
Sign up to set email alerts
|

Sequence-Based Prediction of Plant Allergenic Proteins: Machine Learning Classification Approach

Abstract: This Article proposes a novel chemometric approach to understanding and exploring the allergenic nature of food proteins. Using machine learning methods (supervised and unsupervised), this work aims to predict the allergenicity of plant proteins. The strategy is based on scoring descriptors and testing their classification performance. Partitioning was based on support vector machines (SVM), and a k-nearest neighbor (KNN) classifier was applied. A fivefold cross-validation approach was used to validate the KNN… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 37 publications
0
8
0
Order By: Relevance
“…This assured the universality of ESM-2 pLMs in handling our allergenic protein/peptide data sets. Moreover, since the data set contains nonstandard amino acids ([BOUZ]), the resulting ESM-2 pLMs have the ability to encode protein sequences with these residues, whereas the sequences with nonstandard residues were generally removed during model development in previous studies. ,, Furthermore, even if a given sequence has absent residues at a few positions, ESM-2 pLMs can still encode the sequence because of its inherent capacity from the masked language model strategy. , …”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…This assured the universality of ESM-2 pLMs in handling our allergenic protein/peptide data sets. Moreover, since the data set contains nonstandard amino acids ([BOUZ]), the resulting ESM-2 pLMs have the ability to encode protein sequences with these residues, whereas the sequences with nonstandard residues were generally removed during model development in previous studies. ,, Furthermore, even if a given sequence has absent residues at a few positions, ESM-2 pLMs can still encode the sequence because of its inherent capacity from the masked language model strategy. , …”
Section: Resultsmentioning
confidence: 99%
“…In previous studies where global descriptors (e.g., AAC, DPC, PSSM, etc.) were used for sequence embeddings, the sequential information on proteins/peptides was lost, thereby hindering further improvement in prediction model performance. ,, For example, a tripeptide “ALL” encoded by AAC would be indistinguishable from the tripeptide “LLA”. It can be observed in the study of Sharma et al, where the pure machine learning models only achieved 84% accuracy with AAC as the descriptor .…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The approach is based on rating descriptors and evaluating their classification performance. It is necessary to create a reliable and effective protein categorization system in order to overcome the issue of food allergies (Nedyalkova et al., 2023). This information can then be used to guide the selection of plant varieties and to design breeding strategies that reduce the expression of allergenic proteins or to develop targeted approaches for modifying allergenic proteins.…”
Section: Phenomics and Omics Approaches To Reducing Allergensmentioning
confidence: 99%