Predicting gene families from human DNA sequences using machine learning: a logistic regression approach
Nkgaphe Tsebesebe,
Kelvin Mpofu,
Sphumelele Ndlovu
et al.
Abstract:Machine learning is a powerful technique for analysing large-scale data and learning patterns, which provides high accuracy and shorter processing times. In this work, a machine learning algorithm (multinomial logistic regression) is used to predict the gene families from a human DNA sequence. 4380 sequences were converted into overlapping k-mers of length 6 to produce 232 414 k-mers. The data set was split into 80/20 train and test datasets, and the multinomial logistic regression model achieved a 93.9% accur… Show more
Set email alert for when this publication receives citations?
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.