Chargaff's rule of intra-strand parity (ISP) between complementary mono/oligonucleotides in chromosomes is well established in the scientific literature. Although a large numbers of papers have been published citing works and discussions on ISP in the genomic era, scientists are yet to find all the factors responsible for such a universal phenomenon in the chromosomes. In the present work, we have tried to address the issue from a new perspective, which is a parallel feature to ISP. The compositional abundance values of mono/oligonucleotides were determined in all non-overlapping sub-chromosomal regions of specific size. Also the frequency distributions of the mono/oligonucleotides among the regions were compared using the Kolmogorov–Smirnov test. Interestingly, the frequency distributions between the complementary mono/oligonucleotides revealed statistical similarity, which we named as intra-strand frequency distribution parity (ISFDP). ISFDP was observed as a general feature in chromosomes of bacteria, archaea and eukaryotes. Violation of ISFDP was also observed in several chromosomes. Chromosomes of different strains belonging a species in bacteria/archaea (Haemophilus influenza, Xylella fastidiosa etc.) and chromosomes of a eukaryote are found to be different among each other with respect to ISFDP violation. ISFDP correlates weakly with ISP in chromosomes suggesting that the latter one is not entirely responsible for the former. Asymmetry of replication topography and composition of forward-encoded sequences between the strands in chromosomes are found to be insufficient to explain the ISFDP feature in all chromosomes. This suggests that multiple factors in chromosomes are responsible for establishing ISFDP.
The different triplets encoding the same amino acid, termed as synonymous codons, are not equally abundant in a genome. Factors such as G + C% and tRNA are known to influence their abundance in a genome. However, the order of the nucleotide in each codon per se might also be another factor impacting on its abundance values. Of the synonymous codons for specific amino acids, some are preferentially used in the high expression genes that are referred to as the ‘optimal codons’ (OCs). In this study, we compared OCs of the 18 amino acids in 221 species of bacteria. It is observed that there is amino acid specific influence for the selection of OCs. There is also influence of phylogeny in the choice of OCs for some amino acids such as Glu, Gln, Lys and Leu. The phenomenon of codon bias is also supported by the comparative studies of the abundance values of the synonymous codons with same G + C. It is likely that the order of the nucleotides in the triplet codon is also perhaps involved in the phenomenon of codon usage bias in organisms.
The fourfold degenerate site (FDS) in coding sequences is important for studying the effect of any selection pressure on codon usage bias (CUB) because nucleotide substitution per se is not under any such pressure at the site due to the unaltered amino acid sequence in a protein. We estimated the frequency variation of nucleotides at the FDS across the eight family boxes (FBs) defined as Um(g), the unevenness measure of a gene g. The study was made in 545 species of bacteria. In many bacteria, the Um(g) correlated strongly with Nc'-a measure of the CUB. Analysis of the strongly correlated bacteria revealed that the U-ending codons (GGU, CGU) were preferred to the G-ending codons (GGG, CGG) in Gly and Arg FBs even in the genomes with G+C % higher than 65.0. Further evidence suggested that these codons can be used as a good indicator of selection pressure on CUB in genomes with higher G+C %.
According to the selection-mutation-drift theory of molecular evolution, mutation predominates in determining codon usage bias (CUB) in weakly expressed genes (WEG) whereas selection predominates in determining CUB in highly expressed genes (HEG). Strand-specific mutational bias causes compositional asymmetry of the nucleotides between leading and lagging strands (LaS) in bacterial chromosomes. Keeping in view the aforementioned points, CUB between the strands were compared in Escherichia coli chromosome. In comparison with HEG, codon usage of WEG was observed to be more biased toward strands: G ending codons were significantly more in leading strands than in LaS and the reverse was true for the C ending codons. In case of WEG, the GC 3 skews were found to be significantly different between the strands. This suggests that strand-specific mutational bias influences codon usage of WEG to a greater extent than that of HEG. The differential effect of strand-specific mutational bias in E. coli might be attributed to stronger purifying selection in the HEG than the WEG. The observation here in E. coli supports the SMD theory of molecular evolution.
The present study was undertaken to investigate the pattern of optimal codon usage in Archaea. Comparative analysis was executed to understand the pattern of codon usage bias between the high expression genes (HEG) and the whole genomes in two Archaeal phyla, Crenarchaea and Euryarchaea. The G+C% of the HEG was found to be less in comparison to the genome G+C% in Crenarchaea, whereas reverse was the case in Euryarchaea. The preponderance of U/A ending codons that code for HEG in Crenarchaea was in sharp contrast to the C/G ended ones in Euryarchaea. The analysis revealed prevalence of Uending codons even within theWWY(nucleotide ambiguity code) families in Crenarchaea vis-à-vis Euryarchaea, bacteria and Eukarya. No plausible interpretation of the observed disparity could be made either in the context of tRNA gene composition or genome G+C%. The results in this study attested that the preferential biasness for codons in HEG of Crenarchaea might be different from Euryarchaea. The main highlights are (i) varied CUB in the HEG and in the whole genomes in Euryarchaea and Crenarchaea. (ii) Crenarchaea was found to have some unusual optimal codons (OCs) compared to other organisms. (iii) G+C% (and GC3) of the HEG were different from the genome G+C% in the two phyla. (iv) Genome G+C% and tRNA gene number failed to explain CUB in Crenarchaea. (v) Translational selection is possibly responsible for A+T rich OCs in Crenarchaea.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.