We propose a new approach to predict functional specificity of proteins from their amino acid sequences. Our approach is based on two things: structural Multilevel Neighborhoods of Atom (MNA) descriptors and an original Bayesian algorithm. Usually, a protein sequence is presented as a string of amino acid symbols. Here we introduce a new description of an amino acid sequence: a set of structural MNA descriptors. The MNA descriptor is a string describing an atom and its neighbor atoms according to the selected level. In this work, we also use description of a protein sequence as a set of peptides (strings of amino acid symbols). We performed a case study on two subsubclasses of enzyme nomenclature (EC). It is shown that B-statistics give a sufficient predictive power of enzyme specificity prediction for both MNA descriptors and peptides. We also showed that MNA descriptors give higher accuracy values in comparison with peptides and also provide a choice of MNA descriptor levels for best accuracy prediction. The highest average accuracy prediction that was achieved was 0.98.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.