BackgroundThe quality of automated gene prediction in microbial organisms has improved steadily over the past decade, but there is still room for improvement. Increasing the number of correct identifications, both of genes and of the translation initiation sites for each gene, and reducing the overall number of false positives, are all desirable goals.ResultsWith our years of experience in manually curating genomes for the Joint Genome Institute, we developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm). With Prodigal, we focused specifically on the three goals of improved gene structure prediction, improved translation initiation site recognition, and reduced false positives. We compared the results of Prodigal to existing gene-finding methods to demonstrate that it met each of these objectives.ConclusionWe built a fast, lightweight, open source gene prediction program called Prodigal http://compbio.ornl.gov/prodigal/. Prodigal achieved good results compared to existing methods, and we believe it will be a valuable asset to automated microbial annotation pipelines.
The marine unicellular cyanobacterium Prochlorococcus is the smallest-known oxygen-evolving autotroph. It numerically dominates the phytoplankton in the tropical and subtropical oceans, and is responsible for a significant fraction of global photosynthesis. Here we compare the genomes of two Prochlorococcus strains that span the largest evolutionary distance within the Prochlorococcus lineage and that have different minimum, maximum and optimal light intensities for growth. The high-light-adapted ecotype has the smallest genome (1,657,990 base pairs, 1,716 genes) of any known oxygenic phototroph, whereas the genome of its low-light-adapted counterpart is significantly larger, at 2,410,873 base pairs (2,275 genes). The comparative architectures of these two strains reveal dynamic genomes that are constantly changing in response to myriad selection pressures. Although the two strains have 1,350 genes in common, a significant number are not shared, and these have been differentially retained from the common ancestor, or acquired through duplication or lateral transfer. Some of these genes have obvious roles in determining the relative fitness of the ecotypes in response to key environmental variables, and hence in regulating their distribution and abundance in the oceans.
Marine unicellular cyanobacteria are responsible for an estimated 20-40% of chlorophyll biomass and carbon fixation in the oceans. Here we have sequenced and analysed the 2.4-megabase genome of Synechococcus sp. strain WH8102, revealing some of the ways that these organisms have adapted to their largely oligotrophic environment. WH8102 uses organic nitrogen and phosphorus sources and more sodium-dependent transporters than a model freshwater cyanobacterium. Furthermore, it seems to have adopted strategies for conserving limited iron stores by using nickel and cobalt in some enzymes, has reduced its regulatory machinery (consistent with the fact that the open ocean constitutes a far more constant and buffered environment than fresh water), and has evolved a unique type of swimming motility. The genome of WH8102 seems to have been greatly influenced by horizontal gene transfer, partially through phages. The genetic material contributed by horizontal gene transfer includes genes involved in the modification of the cell surface and in swimming motility. On the basis of its genome, WH8102 is more of a generalist than two related marine cyanobacteria.
Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.
The Prodigal software is freely available under the General Public License from http://code.google.com/p/prodigal/.
Nitrosomonas europaea (ATCC 19718) is a gram-negative obligate chemolithoautotroph that can derive all its energy and reductant for growth from the oxidation of ammonia to nitrite. Nitrosomonas europaea participates in the biogeochemical N cycle in the process of nitrification. Its genome consists of a single circular chromosome of 2,812,094 bp. The GC skew analysis indicates that the genome is divided into two unequal replichores. Genes are distributed evenly around the genome, with ϳ47% transcribed from one strand and ϳ53% transcribed from the complementary strand. A total of 2,460 protein-encoding genes emerged from the modeling effort, averaging 1,011 bp in length, with intergenic regions averaging 117 bp. Genes necessary for the catabolism of ammonia, energy and reductant generation, biosynthesis, and CO 2 and NH 3 assimilation were identified. In contrast, genes for catabolism of organic compounds are limited. Genes encoding transporters for inorganic ions were plentiful, whereas genes encoding transporters for organic molecules were scant. Complex repetitive elements constitute ca. 5% of the genome. Among these are 85 predicted insertion sequence elements in eight different families. The strategy of N. europaea to accumulate Fe from the environment involves several classes of Fe receptors with more than 20 genes devoted to these receptors. However, genes for the synthesis of only one siderophore, citrate, were identified in the genome. This genome has provided new insights into the growth and metabolism of ammonia-oxidizing bacteria.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.