BackgroundOver the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods.ResultsThis paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end.ConclusionsThis study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.
Cancer is one of the most common death causes worldwide. Breast and genital cancers in women and prostate cancer in men constitute three of the most common cancers. Detection and prevention of these types of cancers are critical objectives. Recent findings indicate that some patients suffer from cancer comorbidity. The probability of survival among patients with comorbid condition is lower than those with only one type of cancer. The importance of concomitant chronic illnesses during cancer treatment through the SEER data is assessed through many machine‐learning approaches. In order to improve the accuracy of prediction of survival rates in patients with cancer and comorbidity of cancers, the gradient boosting ensemble method is adopted for feature selection and modelling. This proposed method increases the accuracy rate and reduces the error rate, and exhibits a significant predictive improvement of survival rates in comorbid cancer compared with the previous proposed models.
Protein contact map is a simplified representation of a protein's spatial structure. The Committee Machine is a machine learning method that allots the learning task to a number of learners and divides the input space into subspaces. Learners' responses to an input are combined to produce the system's final response, which is more accurate than any single individual's response. In this study, we propose a novel method called CMP_model, for contact map prediction based on the committee machine. The results of the proposed model in comparison with two other models, show considerable gain (an accuracy improvement from 0.05 to 0.15).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.