We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding ;183 haploid coverage of aligned sequence and close to 3003 clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed matepaired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.[Supplemental material is available online at
This paper presents results on ultralong read DNA sequencing with relatively short separation times using capillary electrophoresis with replaceable polymer matrixes. In previous work, the effectiveness of mixed replaceable solutions of linear polyacrylamide (LPA) was demonstrated, and 1000 bases were routinely obtained in less than 1 h. Substantially longer read lengths have now been achieved by a combination of improved formulation of LPA mixtures, optimization of temperature and electric field, adjustment of the sequencing reaction, and refinement of the base-caller. The average molar masses of LPA used as DNA separation matrixes were measured by gel permeation chromatography and multiangle laser light scattering. Newly formulated matrixes comprising 0.5% (w/w) 270 kDa and 2% (w/w) 10 or 17 MDa LPA raised the optimum column temperature from 60 to 70 degrees C, increasing the selectivity for large DNA fragments, while maintaining high selectivity for small fragments as well. This improved resolution was further enhanced by reducing the electric field strength from 200 to 125 V/cm. In addition, because sequencing accuracy beyond 1000 bases was diminished by the low signal from G-terminated fragments when the standard reaction protocol for a commercial dye primer kit was used, the amount of these fragments was doubled. Augmenting the base-calling expert system with rules specific for low peak resolution also had a significant effect, contributing slightly less than half of the total increase in read length. With full optimization, this read length reached up to 1300 bases (average 1250) with 98.5% accuracy in 2 h for a single-stranded M13 template.
Long, accurate reads are an important factor for high-throughput de novo DNA sequencing. In previous work from this laboratory, a separation matrix of high-weight-average molecular mass (HMM) linear polyacrylamide (LPA) at a concentration of 2% (w/w) was used to separate 1000 bases of DNA sequence in 80 min with an accuracy close to 97% (Carrilho, E.; et al. Anal. Chem. 1996, 68, 3305-3313). In the present work, significantly improved speed and sequencing accuracy have been achieved by further optimization of factors affecting electrophoretic separation and data processing. A replaceable matrix containing a mixture of 2.0% (w/w) HMM (9 MDa) and 0.5% (w/w) low-weight-average molecular mass (50 kDa) LPA was employed to enhance the separation of DNA sequencing fragments in CE. Experimental conditions, such as electric field strength and column temperature, as well as internal diameter of the capillary column, have been optimized for this mixed separation matrix. Under these conditions, in combination with energy-transfer (BigDye) dye-labeled primers for high signal-to-noise ratio and a newly developed expert system for base calling, the electrophoretic separation of 1000 DNA sequencing fragments of both standard (M13mp18) and cloned single-stranded templates from human chromosome 17 could be routinely achieved in less than 55 min, with a base-calling accuracy between 98 and 99%. Identical read length, accuracy, and migration time were achieved in more than 300 consecutive runs in a single column.
A method for the cleanup of Sanger DNA sequencing reaction products for capillary electrophoresis analysis with replaceable polymer solutions has been developed. A poly(ether sulfone) ultrafiltration membrane pretreated with linear polyacrylamide was first used to remove template DNA from the sequencing samples. Then, gel filtration in a spin column format (two columns per sample) was employed to decrease the concentration of salts below 10 microM in the sample solution. The method was very reproducible and increased the injected amount of the sequencing fragments 10-50-fold compared to traditional cleanup protocols. Using M13mp18 as template, the resulting cleaned-up single DNA sequencing fragments could routinely be separated to more than 1000 bases with a base-calling accuracy of at least 99% for 800 bases. The method is simple and universal and can be easily automated. In the following paper, a systematic study to determine quantitatively the effects of the sample solution components such as high-mobility ions (e.g., chloride and dideoxynucleotides) and template DNA on the injected amount and separation efficiency of the sequencing fragments is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.