It is well known that basing phylogenetic reconstructions on uncorrected genetic distances can lead to errors in their reconstruction. Nevertheless, it is often common practice to report simply the most similar BLAST (Altschul et al. 1997) hit in genomic reports that discuss many genes (Ruepp et al. 2000; Freiberg et al. 1997). This is because BLAST hits can provide a rapid, efficient, and concise analysis of many genes at once. These hits are often interpreted to imply that the gene is most closely related to the gene or protein in the databases that returned the closest BLAST hit. Though these two may coincide, for many genes, particularly genes with few homologs, they may not be the same. There are a number of circumstances that can account for such limitations in accuracy (Eisen 2000). We stress here that genes appearing to be the most similar based on BLAST hits are often not each others closest relative phylogenetically. The extent to which this occurs depends on the availability of close relatives present in the databases. As an example we have chosen the analysis of the genomes of a crenarcheaota species Aeropyrum pernix, an organism with few close relatives fully sequenced, and Escherichia coli, an organism whose closest relative, Salmonella typhimurium, is completely sequenced.
Large-scale genome arrangement plays an important role in bacterial genome evolution. A substantial number of genes can be inserted into, deleted from, or rearranged within genomes during evolution. Detecting or inferring gene insertions/deletions is of interest because such information provides insights into bacterial genome evolution and speciation. However, efficient inference of genome events is difficult because genome comparisons alone do not generally supply enough information to distinguish insertions, deletions, and other rearrangements. In this study, homologous genes from the complete genomes of 13 closely related bacteria were examined. The presence or absence of genes from each genome was cataloged, and a maximum likelihood method was used to infer insertion/deletion rates according to the phylogenetic history of the taxa. It was found that whole gene insertions/deletions in genomes occur at rates comparable to or greater than the rate of nucleotide substitution and that higher insertion/deletion rates are often inferred to be present at the tips of the phylogeny with lower rates on more ancient interior branches. Recently transferred genes are under faster and relaxed evolution compared with more ancient genes. Together, this implies that many of the lineage-specific insertions are lost quickly during evolution and that perhaps a few of the genes inserted by lateral transfer are niche specific.
As a means of investigating gene function, we developed a robust transcription fusion reporter vector to measure gene expression in bacteria. The vector, pTH1522, was used to construct a random insert library for the Sinorhizobium meliloti genome. pTH1522 replicates in Escherichia coli and can be transferred to, but cannot replicate in, S. meliloti. Homologous recombination of the DNA fragments cloned in pTH1522 into the S. meliloti genome generates transcriptional fusions to either the reporter genes gfp ؉ and lacZ or gusA and rfp, depending on the orientation of the cloned fragment. Over 12,000 fusion junctions in 6,298 clones were identified by DNA sequence analysis, and the plasmid clones were recombined into S. meliloti. Reporter enzyme activities following growth of these recombinants in complex medium (LBmc) and in minimal medium with glucose or succinate as the sole carbon source allowed the identification of genes highly expressed under one or more growth condition and those expressed at very low to background levels. In addition to generating reporter gene fusions, the vector allows Flp recombinase-directed deletion formation and gene disruption, depending on the nature of the cloned fragment. We report the identification of genes essential for growth on complex medium as deduced from an inability to recover recombinants from pTH1522 clones that carried fragments internal to gene or operon transcripts. A database containing all the gene expression activities together with a web interface showing the precise locations of reporter fusion junctions has been constructed (www.sinorhizobium.org).
A simple sequence is abundant in the proteins that have been sequenced to date. But unusual protein features, such as a simple sequence, are not present in the same high frequency within structural databases. A subset of these simple sequences, a group with a highly repetitive nature has been shown to be abundant in eukaryotes but not in prokaryotes. In this study, an examination of the eukaryotic proteins in the Protein Data Bank (PDB) has revealed a large deficiency of low complexity, highly repetitive protein repeats. Through simulated databases of similar samples of eukaryotic proteins taken from the National Center for Biotechnology Information (NCBI) database, it is shown that the PDB contains a significantly less highly repetitive, simple sequence than artificial databases of similar composition randomly derived from NCBI. When the structural data for those few PDB sequences that did contain a highly repetitive simple sequence is examined in detail, it is found that in most cases the tertiary structure is unknown for the regions consisting of a simple sequence. This lack of a simple sequence both in the PDB database and in the structural information suggests that this type of simple sequence may produce disordered structures that make structural characterization difficult.
Background:The class A scavenger receptors are a subclass of a diverse family of proteins defined based on their ability to bind modified lipoproteins. The 5 members of this family are strikingly variable in their protein structure and function, raising the question as to whether it is appropriate to group them as a family based on their ligand binding abilities. Results:To investigate these relationships, we defined the domain architecture of each of the 5 members followed by collecting and annotating class A scavenger receptor mRNA and amino acid sequences from publicly available databases. Phylogenetic analyses, sequence alignments, and permutation tests revealed a common evolutionary ancestry of these proteins, indicating that they form a protein family. We postulate that 4 distinct gene duplication events and subsequent domain fusions, internal repeats, and deletions are responsible for the diverse protein structures and functions of this family. Despite variation in domain structure, there are highly conserved regions across all 5 members, indicating the possibility that these regions may represent key conserved functional motifs. Conclusions:We have shown with significant evidence that the 5 members of the class A scavenger receptors form a protein family. We have indicated that these receptors have a common origin which may provide insight into future functional work with these proteins.
For decades proteins were thought to interact in a "lock and key" system, which led to the definition of a paradigm linking stable three-dimensional structure to biological function. As a consequence, any non-structured peptide was considered to be nonfunctional and to evolve neutrally. Surprisingly, the most commonly shared peptides between eukaryotic proteomes are low-complexity sequences that in most conditions do not present a stable three-dimensional structure. However, because these sequences evolve rapidly and because the size variation of a few of them can have deleterious effects, low-complexity sequences have been suggested to be the target of selection. Here we review evidence that supports the idea that these simple sequences should not be considered just "junk" peptides and that selection drives the evolution of many of them.
Summary• The internal transcribed spacer (ITS) of the nuclear ribosomal DNA region is a widely used species marker for plants and fungi. Recent metagenomic studies using next-generation sequencing, however, generate only partial ITS sequences. Here we compare the performance of partial and full-length ITS sequences with several classification methods.• We compiled a full-length ITS data set and created short fragments to simulate the read lengths commonly recovered from current next-generation sequencing platforms. We compared recovery, erroneous recovery, and coverage for the following methods: best BLAST hit classification, MEGAN classification, and automated phylogenetic assignment using the Statistical Assignment Program (SAP).• We found that summarizing results with more inclusive taxonomic ranks increased recovery and reduced erroneous recovery. The similarity-based methods BLAST and MEGAN performed consistently across most fragment lengths. Using a phylogeny-based method, SAP runs with queries 400 bp or longer worked best. Overall, BLAST had the highest recovery rates and MEGAN had the lowest erroneous recovery rates.• A high-throughput ITS classification method should be selected, taking into consideration read length, an acceptable tradeoff between maximizing the total number of classifications and minimizing the number of erroneous classifications, and the computational speed of the assignment method.
The ciliated protozoan Tetrahymena thermophila undergoes extensive programmed DNA rearrangements during the development of a somatic macronucleus from the germ line micronucleus in its sexual cycle. To investigate the relationship between programmed DNA rearrangements and transposable elements, we identified several members of a family of non-long terminal repeat (LTR) retrotransposons (retroposons) in T. thermophila, the first characterized in the ciliated protozoa. This multiple-copy retrotransposon family is restricted to the micronucleus of T. thermophila. The REP (Tetrahymena non-LTR retroposon) elements encode an ORF2 typical of non-LTR elements that contains apurinic/apyrimidinic endonuclease (APE) and reverse transcriptase (RT) domains. Phylogenetic analysis of the RT and APE domains indicates that the element forms a deep-branching clade within the non-LTR retrotransposon family. Northern analysis with a probe to the conserved RT domain indicates that transcripts from the element are small and heterogeneous in length during early macronuclear development. The presence of a repeated transposable element in the genome is consistent with the model that programmed DNA deletion in T. thermophila evolved as a method of eliminating deleterious transposons from the somatic macronucleus.Developmentally programmed DNA rearrangements occur in a wide variety of organisms (reviewed in reference 5). Functions such as altering gene dosage or directly regulating gene expression have been assigned to many but not all examples of programmed DNA rearrangements. A clinically important example of a programmed DNA rearrangement is V(D)J recombination (2). In addition, a variety of mammalian parasites use programmed DNA rearrangements to vary their surface antigens to avoid host immune response (4). The function of other programmed DNA rearrangements is not as clear. The extensive genome rearrangements that occur during nuclear development in the ciliated protozoa provide an example of programmed DNA rearrangements with poorly understood function.Like all ciliated protozoa, the oligohymenopheran Tetrahymena thermophila displays nuclear dimorphism with a mostly transcriptionally silent diploid germ line nucleus (micronucleus) and a polyploid and transcriptionally active somatic nucleus (macronucleus) within the same cell. The macronucleus develops from a mitotic product of the micronucleus during conjugation. When two cells of different mating types conjugate, the micronucleus in each divides meiotically and mitotically to generate a haploid gametic nucleus that is reciprocally exchanged and fuses with that of its partner to form a zygotic nucleus. This zygotic nucleus divides and from one of the products develops a new macronucleus, while the old macronucleus is concurrently degraded. In T. thermophila, macronuclear development involves extensive programmed DNA rearrangements, including chromosome fragmentation, DNA amplification, and site-specific interstitial DNA deletion (12). Interstitial DNA deletion is responsible for t...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.