Muhammad S. Zanaty scite author profile

Muhammad S. Zanaty

3Publications

15Citation Statements Received

53Citation Statements Given

How they've been cited

How they cite others

Affiliations

Cairo University

Publications

Order By: Most citations

Computational predictions for protein sequences of COVID-19 virus via machine learning algorithms

Afify

Zanaty

2021

Med Biol Eng Comput

View full text Add to dashboard Cite

The rapid spread of coronavirus disease (COVID-19) has become a worldwide pandemic and affected more than 15 million patients reported in 27 countries. Therefore, the computational biology carrying this virus that correlates with the human population urgently needs to be understood. In this paper, the classification of the human protein sequences of COVID-19, according to the country, is presented based on machine learning algorithms. The proposed model is based on distinguishing 9238 sequences using three stages, including data preprocessing, data labeling, and classification. In the first stage, data preprocessing's function converts the amino acids of COVID-19 protein sequences into eight groups of numbers based on the amino acids' volume and dipole. It is based on the conjoint triad (CT) method. In the second stage, there are two methods for labeling data from 27 countries from 0 to 26. The first method is based on selecting one number for each country according to the code numbers of countries, while the second method is based on binary elements for each country. According to their countries, machine learning algorithms are used to discover different COVID-19 protein sequences in the last stage. The obtained results demonstrate 100% accuracy, 100% sensitivity, and 90% specificity via the country-based binary labeling method with a linear support vector machine (SVM) classifier. Furthermore, with significant infection data, the USA is more prone to correct classification compared to other countries with fewer data. The unbalanced data for COVID-19 protein sequences is considered a major issue, especially as the US's available data represents 76% of a total of 9238 sequences. The proposed model will act as a prediction tool for the COVID-19 protein sequences in different countries.

show abstract

A Comparative Study of Protein Sequences Classification-Based Machine Learning Methods for COVID-19 Virus against HIV-1

Afify

Zanaty

2021

Applied Artificial Intelligence

View full text Add to dashboard Cite

Computational Predictions for Protein Sequences of COVID-19 Virus via Machine Learning Algorithms

Afify

Zanaty²

2020

Preprint

View full text Add to dashboard Cite

The coronavirus infection is increasingly evolving to be an international epidemic in 27 countries as a serious respiratory disease. Therefore, the computational biology carrying this virus that correlated with the human population is urgently needed. In this paper, the classification of the human protein sequences of COVID-19 according to the country is applied by machine learning algorithms. The proposed model is based on the distinguishing of 9238 sequences by three stages including data preprocessing, data labeling, and classification. In the first stage, the function of data preprocessing converts the amino acids of COVID-19 protein sequences to eight groups of numbers based on volume and dipole of the amino acids. In the second stage, there are two methods for data labeling of 27 countries from 0 to 26. The first method is based on the selection of one number for each country according to the code number of countries while the second method is based on binary elements only for each country. The classification algorithms are executed to discover different COVID-19 protein sequences according to their countries. The findings are concluded that the accuracy of 100% performed by country based binary labeling method with Linear Regression (LR) or K-Nearest Neighbor (KNN) or Support Vector Machine (SVM) classifiers. Further, it found that the USA with large data records in infection rate has more priority for correct classification compared to other countries with a low data rate. The unbalanced data for COVID-19 protein sequences is considered a major issue, especially the available data in USA represented 76% from a total of 9238 sequences. As a consequence, this proposed model will help as a diagnostic bioinformatics tool for the COVID-19 protein sequences among different countries.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.