Single-nucleotide polymorphisms (SNPs) are the most frequent type of variation in the human genome, and they provide powerful tools for a variety of medical genetic studies. In a large-scale survey for SNPs, 2.3 megabases of human genomic DNA was examined by a combination of gel-based sequencing and high-density variation-detection DNA chips. A total of 3241 candidate SNPs were identified. A genetic map was constructed showing the location of 2227 of these SNPs. Prototype genotyping chips were developed that allow simultaneous genotyping of 500 SNPs. The results provide a characterization of human diversity at the nucleotide level and demonstrate the feasibility of large-scale identification of human SNPs.
Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency>1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.
Rapid access to genetic information is central to the revolution taking place in molecular genetics. The simultaneous analysis of the entire human mitochondrial genome is described here. DNA arrays containing up to 135,000 probes complementary to the 16.6-kilobase human mitochondrial genome were generated by light-directed chemical synthesis. A two-color labeling scheme was developed that allows simultaneous comparison of a polymorphic target to a reference DNA or RNA. Complete hybridization patterns were revealed in a matter of minutes. Sequence polymorphisms were detected with single-base resolution and unprecedented efficiency. The methods described are generic and can be used to address a variety of questions in molecular genetics including gene expression, genetic linkage, and genetic variability.
Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and URLs. Birdsuite, http://www.broad.mit.edu/mpg/birdsuite/; PLINK CNV analysis tools, http://pngu.mgh.harvard.edu/purcell/plink/cnv; 1000 Genomes, http://www.1000genomes.org. COMPETING INTERESTS STATEMENTThe authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturegenetics/. BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype. NIH Public AccessStudies of SNPs and CNVs in human disease have to date been built on different analytical approaches, and somewhat based on conflicting assumptions. Specifically, SNP genotyping methods 1,2 assume that every individual has two copies of each locus, whereas studies of copy number variation assume that individuals vary in their copy number across the genome. Because SNPs and CNVs coexist throughout the genome, they influence one another's measurement, and may act both separately and in concert to influence human phenotypes. Ignoring CNVs during SNP genotyping results in failure to capture the true underlying sequence at many sites (genotypes like AAB and A), and can create the appearance of violations of mendelian inheritance or Hardy-Weinberg equilibrium where none in fact exists 3,4 . Ignoring SNPs in copy number analysis fails to incorporate allele-specific gains and losses, as well as the potential to exploit linkage disequilibrium between CNVs and nearby SNPs.In addition, methods for copy number analysis have not previously separated the ideas of genotyping known copy number polymorphisms (CNPs) from discovery of rare (and thus previously unobserved) copy number variants (CNVs) 5 . In the former case, as in SNP genotyping, existing information about known polymorphisms can be used to design arrays, train clustering algorithms and assign a prior probability of aberrant copy number to guide interpretation of measurements. Discovery of rare variants, as in sequence analysis for rare mutations, is a much more difficult problem-both because it is more difficult to detect a single event tha...
Accurate identification of tumor-derived somatic variants in plasma circulating cell-free DNA (cfDNA) requires understanding the various biologic compartments contributing to the cfDNA pool. We sought to define the technical feasibility of a high-intensity sequencing assay of cfDNA and matched white-blood cell (WBC) DNA covering a large genomic region (508 genes, 2Mb, >60,000X raw-depth) in a prospective study of 124 metastatic cancer patients, with contemporaneous matched tumor tissue biopsies, and 47 non-cancer controls. The assay displayed a high sensitivity and specificity, allowing for de novo detection of tumor-derived mutations and inference of tumor mutational burden, microsatellite instability, mutational signatures and sources of somatic mutations identified in cfDNA. The vast majority of cfDNA mutations (81.6% in controls and 53.2% in cancer patients) had features consistent with clonal hematopoiesis (CH). This cfDNA sequencing approach revealed that CH constitutes a pervasive biological phenomenon emphasizing the importance of matched cfDNA-WBC sequencing for accurate variant interpretation.
A hierarchy of simple models is used to design robust estimators meeting these goals for both stand alone and comparative experiments. This algorithm has been validated against an extensive panel of known spike experiments, and shows comparable performance to existing standards.
We present rank-based algorithms for making detection and comparison calls on expression microarrays. The detection call algorithm utilizes the discrimination scores. The comparison call algorithm utilizes intensity differences. Both algorithms are based on Wilcoxon's signed-rank test. Several parameters in the algorithms can be adjusted by the user to alter levels of specificity and sensitivity. The algorithms were developed and analyzed using spiked-in genes arrayed in a Latin square format. In the call process, p-values are calculated to give a confidence level for the pertinent hypotheses. For comparison calls made between two arrays, two primary normalization factors are defined. To overcome the difficulty that constant normalization factors do not fit all probe sets, we perturb these primary normalization factors and make increasing or decreasing calls only if all resulting p-values fall within a defined critical region. Our algorithms also automatically handle scanner saturation.
We present a genotyping method for simultaneously scoring 116,204 SNPs using oligonucleotide arrays. At call rates >99%, reproducibility is >99.97% and accuracy, as measured by inheritance in trios and concordance with the HapMap Project, is >99.7%. Average intermarker distance is 23.6 kb, and 92% of the genome is within 100 kb of a SNP marker. Average heterozygosity is 0.30, with 105,511 SNPs having minor allele frequencies >5%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.