To investigate whether genome sequencing yields more useful markers than those currently used to study the epidemiology of tuberculosis, it was applied to three Mycobacterium tuberculosis isolates of the Harlingen outbreak. Our findings suggest that single nucleotide polymorphisms can be used to identify transmission chains in restriction fragment length polymorphism clusters.Molecular typing contributes significantly to our understanding of the epidemiology of tuberculosis. A variety of genetic markers, such as IS6110 restriction fragment length polymorphism (RFLP) and variable-number tandem repeat (VNTR) typing, are currently used for DNA fingerprinting of Mycobacterium tuberculosis isolates (2,7,(12)(13)(14)23). Unfortunately, these markers do not distinguish primary and subsequent sources of infection in long-term DNA fingerprinting surveillance, as the turnover of these markers is not in range with the pace of transmission (4-6). Therefore, molecular typing is inaccurate when applied for extended time periods in a given area.In the Netherlands, IS6110 RFLP typing has been routinely used for molecular epidemiology since the early 1990s. A remarkably large outbreak began in the city of Harlingen in 1992, and this cluster grew to over 100 cases in 2008 and is still expanding (10, 11). Although a small subset of isolates of this cluster exhibited a single transposition or deletion of IS6110, it soon became impossible to distinguish sources of infection and secondary and subsequent cases in the cluster. Some contact chains in the Harlingen cluster were suggested by contact tracing, performed according to the stone-in-the-pond principle (15, 25), but the exact transmission chains could not be validated by fingerprinting of the M. tuberculosis isolates, as most of the isolates revealed the same DNA fingerprints.For this study, three isolates from two chains of transmission in the Harlingen cluster that could be accurately determined by contact tracing were selected for genome sequencing (Fig. 1). The bacterial isolates exhibited no change in antituberculosis drug resistance or any other observable change in phenotype. Sequencing and analysis of strains SH1 and SH5, as well as the tempo and mode of evolutionary changes between these two isolates, were described in one of our earlier studies (19). The DNA of strain SH9, purified according to the method of van Soolingen et al. (24), was de novo sequenced on a GS FLX Titanium system, and assembly of raw sequencing reads with an average read length of 400 bases was performed by using the Genome Sequencer software, version 2.0.0.22. Sequence reads, contigs, and quality scores were provided by Microsynth AG, Switzerland. The SH9 sequence consisted of 214,283,462 high-quality bases assembled in 401 contigs with 4,207,440 bases (50.9-fold coverage). In total, 95.4% of the theoretical genome size of 4.41 Mb was available for analysis. From the in silico comparison of the three genomes, eight polymorphic single nucleotide polymorphisms (SNPs) were verified by subsequent rese...