A hybrid evolutionary model is used to propose a hierarchical homology of protein sequences to identify protein functions systematically. The proposed model offers considerable potentials, considering the inconsistency of existing methods for predicting novel proteins. Because some novel proteins might align without meaningful conserved domains, maximizing the score of sequence alignment is not the best criterion for predicting protein functions. This work presents a decision model that can minimize the cost of making a decision for predicting protein functions using the hierarchical homologies. Particularly, the model has three characteristics: (i) it is a hybrid evolutionary model with multiple fitness functions that uses genetic programming to predict protein functions on a distantly related protein family, (ii) it incorporates modified robust point matching to accurately compare all feature points using the moment invariant and thin-plate spline theorems, and (iii) the hierarchical homologies holding up a novel protein sequence in the form of a causal tree can effectively demonstrate the relationship between proteins. This work describes the comparisons of nucleocapsid proteins from the putative polyprotein SARS virus and other coronaviruses in other hosts using the model.
The expression of genes in mammalian cells can be constitutive, transient, or inducible. Transcripts of transient and inducible genes are difficult to discover using the EST approach. Transiently expressed genes, however, are crucial to embryo development and the pathogenesis of disease because they determine the outcome of disease. Using our new bioinformatics approach, which we believe will facilitate verification of novel transcripts in developing embryos or pathogen-induced cells; we aimed to identify novel exons in transiently expressed genes. First of all, the proposed method uses a general gene predictor that must be able to produce all possibly optimal or suboptimal candidate exons in human. After applying signal processing, an anchoring procedure in the method transforms and groups the candidate sequences into many numeric hashing-signals clusters rapidly. In the meanwhile, an entropy-based theorem in the method can be used to remove the most error matches, repeat matches. Finally, the method generates the resulting exons identified by alignment with other genomic or EST sequence in cross-species. Our results indicated that 3,223 filtered target exons were potential novel exons. The theoretical threshold determined using the computational method for filtering repeat matches had 95.3% sensitivity and 81.8% specificity. The inferential threshold, however, was close to the experimental threshold, which is a practical expected value for considering both sensitivity and specificity. Therefore, our results proved the feasibility of the method. Combining the anchoring method embedded an entropy-based filter with an inherently unreliable gene predictor can be used to obtain a small scope of exons that may be potentially novel because the combination avoids many drawbacks of some traditional gene predictors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.