14In this work, we exploit recent advances in metagenomic assembly and bacteriophage 15 identification to describe the phage content of saliva from 5 mother-baby pairs sampled 16 twice 7 -11 months apart during the first year of the babies' lives. We identify 25 phage 17 genomes that are comprised of one to 71 contigs, with 16 having a single contig. At the 18 detectable level, phage were sparsely distributed with the most common one being 19 present in 4 of the 20 samples, derived from two mothers and one baby. However, if 20 they were present in the early time point sample from an individual, they were also 21 present in the later sample from the same person more frequently than expected by 22 DNA and library preparation 74 DNA was isolated with QIAamp DNA mini kits (QIAgen, www.qiagen.com) using the included 75 protocol augmented by the inclusion of a bead-beating step to increase bacterial cell lysis. 100 76 ng of DNA in a volume of 50 µl was subjected to fragmentation in a Covaris S2 instrument with 77Intensity 5, Duty cycle 10%, Cycles per burst 200 and Treatment time 50s. 40 ng of the 78 fragmented DNA was used to make Illumina sequencing libraries with the NEB Next DNA library 79 kit for Illumina (New England Biolabs, www.neb.com). For 3 samples (family 1 mother 10 month, 80 family 2 baby 3 month, and family 2 baby 10 month) we performed enrichment with the NEB 81 Next Microbiome Enrichment Kit (New England Biolabs, www.neb.com) and sequenced both 82 the pre-enrichment and post-enrichment samples. For one sample (family 1 mother 2 month), 83 we sequenced only an enriched sample. Comparisons of the pre-and post-enrichment samples 84 with Metaphlan2 (17) showed only minor differences in the bacterial content of the two samples 85 so we pooled sequence data from the three repeated samples and used only the enriched data 86 for the one sample. 87 We sequenced the pooled libraries in one lane of the HiSeq 4000 (Illumina, www.illumina.com) 88 with 150 PE chemistry. 89 90 Sequence filtering and assembly 91 The sequencing reads were trimmed to remove adapter sequences and low quality regions with 92 Trimmomatic v. 0.32 (18), using the parameters ILLUMINACLIP::2:30:10:1:true and MINLEN:70. Human reads were removed by mapping against the 94 human genome (human_g1k_v37.fasta from 95 ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/reference/) with BWA-MEM 0.7.8 (19), and 96 processing with FilterSamReads and SamToFastq from Picard tools 1.112 97 6 (github.com/broadinstitute/picard). The reads from all samples were co-assembled using 98 MEGAHIT v1.0.6-3-gfb1e59b (20). The assembly was examined with metaquast (21). The 99 human-depleted sequences are deposited in the NCBI SRA associated with BioProject 100 accession PRJNA448135. 101 102
Clustering of contigs and identification of phage-encoding contigs 103We processed the contigs through the CONCOCT clustering pipeline (16), which incorporates 104 the following steps: (1) Co-assembly of all samples using MEGAHIT (20); (2) Discarding contigs ...