We developed a non-parametric method of Information Decomposition (ID) of a content of any symbolical sequence. The method is based on the calculation of Shannon mutual information between analyzed and artificial symbolical sequences, and allows the revealing of latent periodicity in any symbolical sequence. We show the stability of the ID method in the case of a large number of random letter changes in an analyzed symbolic sequence. We demonstrate the possibilities of the method, analyzing both poems, and DNA and protein sequences. In DNA and protein sequences we show the existence of many DNA and amino acid sequences with different types and lengths of latent periodicity. The possible origin of latent periodicity for different symbolical sequences is discussed.
In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (x). We generated sets of artificial DNA sequences with x ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existing methods could produce statistically significant alignments only for the sets with x < 2.5, whereas MAHDS could operate on sequences with x = 4.4. We also used MAHDS to analyze a set of promoter sequences from the Arabidopsis thaliana genome and discovered many conserved regions upstream of the transcription initiation site (from −499 to +1 bp); a part of the downstream region (from +1 to +70 bp) also significantly contributed to the obtained alignments. The possibilities of applying the newly developed method for the identification of promoter sequences in any genome are discussed. A server for multiple alignment of nucleotide sequences has been created.
The definition of a phase shift of triplet periodicity (TP) is introduced. The mathematical algorithm for detection of TP phase shift of nucleotide sequences has been developed. Gene sequences from Kegg-46 data bank were analyzed with a purpose of searching genes with a phase shift of TP. The presence of a phase shift of triplet periodicity has been shown for 318329 genes (~10% from the number of genes in Kegg-46). We suppose that shifts of the TP phase may indicate the shifts of reading frame (RF) in genes. A relationship between the phase shifts of TP and the frame shifts in genes is discussed. IntroductionMutations in gene sequences arise as substitutions, deletions and insertions of DNA bases, and as deletions, insertions and inversions of the whole DNA fragments [1-2]. Substitutions of DNA bases can induce substitutions of amino acids in protein, and a substitution of one base can change only one amino acid in amino acid sequence. These amino acid alterations very often have a strong influence on protein structure and protein ability to perform its biological function [3]. However, deletions or insertions of DNA bases can change long amino acid sequence due to the shift of a reading frame (RF) in a case when a size of deletion or insertion is not divisible by 3. Amino acid sequences downstream the point of RF shift are changed. In this sense deletions and insertions can be considered as more important evolutionary events than base substitutions. The influence of RF shifts on protein function has been studied relatively poorly due to the difficulty of detection of RF shifts. Though, it is very interesting to study the influence of RF shifts on protein function. If protein does not lose its biological function because of RF shift, it is possible to suggest two hypotheses. Firstly, RF shift may change the unimportant part of protein, and this change of amino acid sequence can not influence the protein function. Secondly, we can suppose that RF shift could create amino acid sequence with similar amino acid function. It is very interesting to find the laws of changes of the amino acids in amino acid sequence that allow to create new amino acid sequences with the same function as initial one has. If upon RF shift the biological function of amino acid sequence has been changed, it is very interesting to reveal the types of amino acid sequence changes which have lead to development of a new biological function of a protein.The results of such studies could be used for designing artificial proteins having the necessary biological functions.The better understanding of the RF shifts influence on protein structure and function will be possible if we develop the mathematical method for the better detection of RF shifts in the known gene sequences. Currently the main method of searching for RF shifts is based on searching for similarities between amino acid sequences with a help of BLAST program or similar programs [4][5][6][7][8]. To search for a similarity we should find the gene region in which we suppose the ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.