Bacteriophages are the most abundant biological entities in the biosphere, and this dynamic and old population is, not surprisingly, highly diverse genetically. Relative to bacterial genomics, phage genomics has advanced slowly, and a higher-resolution picture of the phagosphere is only just emerging. This view reveals substantial diversity even among phages known to infect a common host strain, but the relationships are complex, with mosaic genomic architectures generated by illegitimate recombination over a long period of evolutionary history.
Bacteriophages are the dark matter of the biological world (1); a vastness of ill-defined genetic variation whose impacts we observe on the microbial population but of which we have little understanding. The phage population is estimated to contain approximately 10 31 particles and is highly dynamic, with the population turning over every few days. Moreover, this rolling boil of evolution has been churning away for perhaps two billion years or more, giving rise to fantastic genetic diversity (2, 3).It is noteworthy that these estimations of phage population size and turnover emerged primarily from observations made with water samples that are simple to collect and quantify (4). Although terrestrial environments may contribute a relatively minor part of the total numbers of phage particles in the biosphere, prokaryotic diversity in soil samples is very high (5), which is anticipated to be reflected in the companion phage populations. Quantifying the phage populations in soil and terrestrial samples can be tricky, but phages are estimated to be present at levels approaching 10 9 particles per gram of soil (6).Two key approaches to defining viral diversity are metagenomics of total concentrated phage samples collected from the environment and a genome-by-genome strategy of analyzing individually isolated phages. The two approaches are compatible but have distinct outcomes. Metagenomics generates a large amount of sequence data and provides good indicators of diversity. Analysis of individually isolated phages generates smaller data sets, but they are structured into whole genomes. Because phage genomes are architecturally mosaic, the availability of complete genomes contextualizes the complexities of the relationships among phages (7). Moreover, individual phages are available for genetic, biochemical, and microbiological analyses, a critical resource given the evident functional and regulatory novelty within the phage population. Metagenomics typically does not offer a confident linkage between viral sequences and potential bacterial hosts, although methods for enrichment with particular hosts have been described (8). For individually recovered phages, host ranges can be defined empirically.Perhaps not surprisingly, phage diversity is sufficiently high that phages infecting phylogenetically distant hosts share little genetic information (9). The question then arises as to what is the diversity of viruses that infect a single host strain, in which they can be assumed to be ...