Chloroplast genomes have eight-cluster structuredness, in triplet frequency space. Small fragments of a genome converted into a triplet frequency dictionaries are the elements to be clustered. Typical structure consists of eight clusters: six of them correspond to three different positions of a reading frame shifted for 0, 1 and 2 nucleotides (in two opposing strands), the seventh cluster corresponds to a junk regions of a genome, and the eighth cluster is comprised by the fragments with excessive GC-content bearing specific RNA genes. The structure exhibits a specific symmetry.
Background
Previously, a seven-cluster pattern claiming to be a universal one in bacterial genomes has been reported. Keeping in mind the most popular theory of chloroplast origin, we checked whether a similar pattern is observed in chloroplast genomes.
Results
Surprisingly, eight cluster structure has been found, for chloroplasts. The pattern observed for chloroplasts differs rather significantly, from bacterial one, and from that latter observed for cyanobacteria. The structure is provided by clustering of the fragments of equal length isolated within a genome so that each fragment is converted in triplet frequency dictionary with non-overlapping triplets with no gaps in frame tiling. The points in 63-dimensional space were clustered due to elastic map technique. The eight cluster found in chloroplasts comprises the fragments of a genome bearing tRNA genes and exhibiting excessively high GC-content, in comparison to the entire genome.
Conclusion
Chloroplasts exhibit very specific symmetry type in distribution of coding and non-coding fragments of a genome in the space of triplet frequencies: this is mirror symmetry. Cyanobacteria may have both mirror symmetry, and the rotational symmetry typical for other bacteria.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.