The marine unicellular cyanobacterium Prochlorococcus is the smallest-known oxygen-evolving autotroph. It numerically dominates the phytoplankton in the tropical and subtropical oceans, and is responsible for a significant fraction of global photosynthesis. Here we compare the genomes of two Prochlorococcus strains that span the largest evolutionary distance within the Prochlorococcus lineage and that have different minimum, maximum and optimal light intensities for growth. The high-light-adapted ecotype has the smallest genome (1,657,990 base pairs, 1,716 genes) of any known oxygenic phototroph, whereas the genome of its low-light-adapted counterpart is significantly larger, at 2,410,873 base pairs (2,275 genes). The comparative architectures of these two strains reveal dynamic genomes that are constantly changing in response to myriad selection pressures. Although the two strains have 1,350 genes in common, a significant number are not shared, and these have been differentially retained from the common ancestor, or acquired through duplication or lateral transfer. Some of these genes have obvious roles in determining the relative fitness of the ecotypes in response to key environmental variables, and hence in regulating their distribution and abundance in the oceans.
BackgroundIdentifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or lack proteins with similarity to previously known viruses.MethodsWe have developed VirFinder, the first k-mer frequency based, machine learning method for virus contig identification that entirely avoids gene-based similarity searches. VirFinder instead identifies viral sequences based on our empirical observation that viruses and hosts have discernibly different k-mer signatures. VirFinder’s performance in correctly identifying viral sequences was tested by training its machine learning model on sequences from host and viral genomes sequenced before 1 January 2014 and evaluating on sequences obtained after 1 January 2014.ResultsVirFinder had significantly better rates of identifying true viral contigs (true positive rates (TPRs)) than VirSorter, the current state-of-the-art gene-based virus classification tool, when evaluated with either contigs subsampled from complete genomes or assembled from a simulated human gut metagenome. For example, for contigs subsampled from complete genomes, VirFinder had 78-, 2.4-, and 1.8-fold higher TPRs than VirSorter for 1, 3, and 5 kb contigs, respectively, at the same false positive rates as VirSorter (0, 0.003, and 0.006, respectively), thus VirFinder works considerably better for small contigs than VirSorter. VirFinder furthermore identified several recently sequenced virus genomes (after 1 January 2014) that VirSorter did not and that have no nucleotide similarity to previously sequenced viruses, demonstrating VirFinder’s potential advantage in identifying novel viral sequences. Application of VirFinder to a set of human gut metagenomes from healthy and liver cirrhosis patients reveals higher viral diversity in healthy individuals than cirrhosis patients. We also identified contig bins containing crAssphage-like contigs with higher abundance in healthy patients and a putative Veillonella genus prophage associated with cirrhosis patients.ConclusionsThis innovative k-mer based tool complements gene-based approaches and will significantly improve prokaryotic viral sequence identification, especially for metagenomic-based studies of viral ecology.Electronic supplementary materialThe online version of this article (doi:10.1186/s40168-017-0283-5) contains supplementary material, which is available to authorized users.
The recent development of metagenomic sequencing makes it possible to sequence microbial genomes including viruses in an environmental sample. Identifying viral sequences from metagenomic data is critical for downstream virus analyses. The existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences. Here we have developed a reference-free and alignment-free machine learning method, DeepVirFinder, for predicting viral sequences in metagenomic data using deep learning techniques. DeepVirFinder was trained based on a large number of viral sequences discovered before May 2015. Evaluated on the sequences after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths. Enlarging the training data by adding millions of purified viral sequences from environmental metavirome samples significantly improves the accuracy for predicting underrepresented viruses. Applying DeepVirFinder to real human gut metagenomic samples from patients with colorectal carcinoma (CRC) identified 51,138 viral sequences belonging to 175 bins. Ten bins were associated with the cancer status, indicating their potential use for non-invasive diagnosis of CRC. In summary, DeepVirFinder greatly improved the precision and recall rates of viral identification, and it will significantly accelerate the discovery rate of viruses.
Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$d_2^*$\end{document} at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacteriophage and 2699 bacterial genomes, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$d_2^*$\end{document} host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Euclidian distance ONF (32%) or homology-based (22-62%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$d_2^*$\end{document}-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metagenomic contigs from the same habitat or samples as the query viruses. The \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$d_2^*$\end{document} ONF method will greatly improve the characterization of novel, metagenomic viruses.
Marine picocyanobacteria, comprised of the genera Synechococcus and Prochlorococcus, are the most abundant and widespread primary producers in the ocean. More than 20 genetically distinct clades of marine Synechococcus have been identified, but their physiology and biogeography are not as thoroughly characterized as those of Prochlorococcus. Using clade-specific qPCR primers, we measured the abundance of 10 Synechococcus clades at 92 locations in surface waters of the Atlantic and Pacific Oceans. We found that Synechococcus partition the ocean into four distinct regimes distinguished by temperature, macronutrients and iron availability. Clades I and IV were prevalent in colder, mesotrophic waters; clades II, III and X dominated in the warm, oligotrophic open ocean; clades CRD1 and CRD2 were restricted to sites with low iron availability; and clades XV and XVI were only found in transitional waters at the edges of the other biomes. Overall, clade II was the most ubiquitous clade investigated and was the dominant clade in the largest biome, the oligotrophic open ocean. Co-occurring clades that occupy the same regime belong to distinct evolutionary lineages within Synechococcus, indicating that multiple ecotypes have evolved independently to occupy similar niches and represent examples of parallel evolution. We speculate that parallel evolution of ecotypes may be a common feature of diverse marine microbial communities that contributes to functional redundancy and the potential for resiliency.
Marine Synechococcus is a globally significant genus of cyanobacteria that is comprised of multiple genetic lineages or clades. These clades are thought to represent ecologically distinct units, or ecotypes. Because multiple clades often co-occur together in the oceans, Synechococcus are ideal microbes to explore how closely related bacterial taxa within the same functional guild of organisms co-exist and partition marine habitats. Here we sequenced multiple gene loci from cultured strains to confirm the congruency of clade classifications between the 16S–23S rDNA internally transcribed spacer (ITS), 16S rDNA, narB, ntcA, and rpoC1 loci commonly used in Synechococcus diversity studies. We designed quantitative PCR (qPCR) assays that target the ITS for 10 Synechococcus clades, including four clades, XV, XVI, CRD1, and CRD2, not covered by previous assays employing other loci. Our new qPCR assays are very sensitive and specific, detecting down to tens of cells per ml. Application of these qPCR assays to field samples from the northwest Atlantic showed clear shifts in Synechococcus community composition across a coastal to open-ocean transect. Consistent with previous studies, clades I and IV dominated cold, coastal Synechococcus communities. Clades II and X were abundant at the two warmer, off-shore stations, and at all stations multiple Synechococcus clades co-occurred. qPCR assays developed here provide valuable tools to further explore the dynamics of microbial community structure and the mechanisms of co-existence.
Marine microbial communities often contain multiple closely related phylogenetic clades, but in many cases, it is still unclear what physiological traits differentiate these putative ecotypes. The numerically abundant marine cyanobacterium Synechococcus can be divided into at least 14 clades. In order to better understand ecotype differentiation in this genus, we assessed the diversity of a Synechococcus community from a well-mixed water column in the Sargasso Sea during March 2002, a time of year when this genus typically reaches its annual peak in abundance. Diversity was estimated from water sampled at three depths (approximately 5, 70, and 170 m) using both culture isolation and construction of cyanobacterial 16S-23S rRNA internal transcribed sequence clone libraries. Clonal isolates were obtained by enrichment with ammonium, nitrite, or nitrate as the sole N source, followed by pour plating. Each method sampled the in situ diversity differently. The combined methods revealed a total of seven Synechococcus phylotypes including two new putative ecotypes, labeled XV and XVI. Although most other isolates grow on nitrate, clade XV exhibited a reduced efficiency in nitrate utilization, and both clade XV and XVI are capable of chromatic adaptation, demonstrating that this trait is more widely distributed among Synechococcus strains than previously known. Thus, as in its sister genus Prochlorococcus, light and nitrogen utilization are important factors in ecotype differentiation in the marine Synechococcus lineage.
The closely related cyanobacteria Synechococcus and Prochlorococcus have different distributions in stratified water columns in the northern Sargasso Sea. The abundance of Synechococcus is relatively uniform with depth, but Prochlorococcus cell numbers are low within shallow mixed layers and high in and below the thermocline. Because free cupric ion (free Cu 2ϩ ) concentrations are high (up to 6 pM) in shallow mixed layers and lower in deeper water, there is an inverse relationship between Prochlorococcus densities and the free Cu 2ϩ concentration. We explored the possibility of a causal underpinning for this relationship by examining the relative copper sensitivities of Prochlorococcus and Synechococcus in cultures and field populations. Prochlorococcus isolates from both the high-and low-light adapted ecotypes were inhibited at free Cu 2ϩ concentrations that had no effect on Synechococcus. However, the high-light adapted strains were more copper resistant than their low-light adapted counterparts. When copper was added to Prochlorococcus from environments where the in situ free Cu 2ϩ was low (in deeply mixed water columns and below the mixed layer in stratified conditions), net growth rates were substantially reduced and cells arrested in the G 1 and early S phases of the cell cycle. Prochlorococcus in shallow mixed layers were less sensitive to copper and were probably members of the copper-resistant high-light adapted ecotype. Synechococcus were relatively copper resistant across a range of environments in the Sargasso Sea. These observations are consistent with our hypothesis that copper plays a role in cyanobacteria ecology in the Sargasso Sea.Human activities have produced a measurable increase in the concentrations of trace metals in even remote environments. For instance, the atmospheric flux of copper to Greenland ice has increased by over an order of magnitude from the pre-Roman period to the present (Hong et al. 1996). 3 Corresponding author (chisholm@mit.edu). AcknowledgmentsWe thank P. Lam and H. Hsu for invaluable technical assistance in the laboratory and M. Saito for stimulating discussions. We also thank the captain and crew of the RV Oceanus for making the field experiments possible. Comments from two anonymous reviewers have been very helpful in revising the manuscript.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.