The demand for proteins with special purposes increases significantly. These proteins are generally obtained through recombinant proteins, however their purification is costly and not easy. It is necessarily important to develop a method to estimate the chance of purification beforehand in order to have a prospective on proteins in question. Purification of a protein should be related to instinct properties of a protein including its 3D structure, and so far around 540 amino acid properties are found. Thus it is possible to test each amino acid property against the successful rate of protein purification to find out which property is more suitable to estimate the purification propensity. In this study, each of 535 properties was tested against 438 purified and 429 impossible purified proteins from Bacillus halodurans using logistic regression and neural network model. ROC analysis was applied to the resultant sensitivity and specificity. The results show that amino acid composition properties were generally less helpful to estimate the purification propensity whereas amino acid physicochemical properties, secondary structures and dynamic properties were more useful, and dynamic properties were more promising. Therefore several types of protein properties can serve to determine purification propensity of proteins, and have the potential to reduce the cost and to speed up the production in microbiological and biotechnical fields.
Purification Propensity for Proteins from Bacillus halodurans
Results and DiscussionIt is important for biotechnological industries to make a large quantity of highly stable and purified recombinant proteins, which provide economically affordable sources for clinical and industrial applications and research. Usually, purification is laborious and unexciting although various expression systems are employed successfully, such as codon optimization in expression. This is the reason why amino acid properties were analyzed to find out which amino acid property could provide a clue on the chance of successful purification.The upper panel of Figure 1 showed the accuracy, sensitivity and specificity resulting from logistic regression that was used to find out which of 535 amino acid properties was useful to estimate the purification propensity for 857 proteins from B. halodurans. In this figure, x-axis indicated each of 535 amino acid properties (Supplementary Material) while y-axis indicated the accuracy, sensitivity and specificity. At first glance, the specificity was the best followed by the accuracy and the sensitivity. Moreover, little difference appeared between 535 amino acid properties because the specificity, accuracy and sensitivity were colored similarly, but it was necessary to pick out the poorly performed amino acid properties, which were colored in blue in the upper panel of Figure 1. The amino acid properties related to electric charges were not good in estimating the chance of protein purification.The lower panel of Figure 1 displayed the receiver operating characterist...