Presently, inferring the long-range structure of the DNA templates is limited by short read lengths. Accurate template counts suffer from distortions occurring during PCR amplification. We explore the utility of introducing random mutations in identical or nearly identical templates to create distinguishable patterns that are inherited during subsequent copying. We simulate the applications of this process under assumptions of error-free sequencing and perfect mapping, using cytosine deamination as a model for mutation. The simulations demonstrate that within readily achievable conditions of nucleotide conversion and sequence coverage, we can accurately count the number of otherwise identical molecules as well as connect variants separated by long spans of identical sequence. We discuss many potential applications, such as transcript profiling, isoform assembly, haplotype phasing, and de novo genome assembly.mutational tagging | expression profiling | copy number variation S ome problems in genomic analysis have remained difficult despite the development of high throughput sequencing methods. Many of these problems arise from the inability to distinguish identical and nearly identical template sequences. Counting molecules of identical composition in an RNA sequencing assay or the copy number of identical stretches of DNA currently depend on quantitative methods that adjust imperfectly for the distortions of data caused by sample processing. Moreover, because read lengths are short, determining the physical connection of distinguishable elements separated by long identical stretches is difficult to impossible and limits our ability to phase single nucleotide variants (SNVs), identify transcript isoforms, and assemble through repetitive genomic regions. We propose a method that solves these problems by randomly mutagenizing the original template molecules. Each template thus bears a unique signature that is imprinted on all of its subsequent copies and the fragments of those copies. Counting molecules becomes a matter of counting unique mutational patterns and assembly a matter of connecting reads with overlapping mutation patterns.Modifying molecules to facilitate counting is not a new idea. There are several protocols in which a sequence of random nucleotides is appended to the template molecules before amplification and sequencing. This idea has been applied under a variety of names to identify PCR duplicates (1, 2), improve counting of DNA (3, 4) and RNA (5-7) templates, and reduce sequence error (8-10). Each implementation has its own name for the random nucleotide sequences, and we refer to them as varietal tags (11). Counting varietal tags serves the same role as counting unique mutational signatures, mitigating the effects of amplification bias. The advantage of tagging over mutation is that the original message is completely recoverable. The disadvantage is that the tag is confined to one end of the molecule such that identity and count can only be distinguished within one read length of the ends. Furthe...