Photosynthetic eukaryotes, particularly unicellular forms, possess a fossil record that is either wrought with gaps or difficult to interpret, or both. Attempts to reconstruct their evolution have focused on plastid phylogeny, but were limited by the amount and type of phylogenetic information contained within single genes. Among the 210 different protein-coding genes contained in the completely sequenced chloroplast genomes from a glaucocystophyte, a rhodophyte, a diatom, a euglenophyte and five land plants, we have now identified the set of 45 common to each and to a cyanobacterial outgroup genome. Phylogenetic inference with an alignment of 11,039 amino-acid positions per genome indicates that this information is sufficient--but just rarely so--to identify the rooted nine-taxon topology. We mapped the process of gene loss from chloroplast genomes across the inferred tree and found that, surprisingly, independent parallel gene losses in multiple lineages outnumber phylogenetically unique losses by more that 4:1. We identified homologues of 44 different plastid-encoded proteins as functional nuclear genes of chloroplast origin, providing evidence for endosymbiotic gene transfer to the nucleus in plants.
Thirty-nine proteins encoded in a large gene cluster that is well-conserved in gene content and gene order across 18 sequenced prokaryotic genomes were extracted, aligned and subjected to phylogenetic analysis. In individual analyses of the alignments, only two probable examples of lateral gene transfer between archaea and eubacteria were detected, involving the genes for ribosomal protein Rpl23 and adenylate kinase. Amino acid sequences for 35 of the 39 proteins were concatenated to yield a data set of 9087 amino acid positions per genome. Many of these proteins, 33 of which are ribosomal proteins, are not highly conserved across distantly related organisms and thus contain many regions that are difficult to align. Phylogenetic analyses were performed with subsets of the concatenated data from which the most highly variable sites had been iteratively removed, using the number of different amino acids that occur at a given site as a criterion of variability. Glycine, which has a strong influence on protein structure, tended to be more frequent at the most conserved (least polymorphic) sites. With most subsets of the data, the proteins from the cyanobacterium Synechocystis tended to branch with their homologues from Gram-positive bacteria. The results indicate that excluding only a few percentage of poorly alignable sites from phylogenetic analysis can have a severe impact upon the phylogeny inferred and that bootstrap support for branches can fluctuate substantially, depending upon which sites are excluded.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.