Targeted nucleases are powerful tools for mediating genome alteration with high precision. The RNA-guided Cas9 nuclease from the microbial clustered regularly interspaced short palindromic repeats (CRISPR) adaptive immune system can be used to facilitate efficient genome engineering in eukaryotic cells by simply specifying a 20-nt targeting sequence within its guide RNA. Here we describe a set of tools for Cas9-mediated genome editing via nonhomologous end joining (NHEJ) or homology-directed repair (HDR) in mammalian cells, as well as generation of modified cell lines for downstream functional studies. to minimize off-target cleavage, we further describe a double-nicking strategy using the Cas9 nickase mutant with paired guide RNAs. This protocol provides experimentally derived guidelines for the selection of target sites, evaluation of cleavage efficiency and analysis of off-target activity. Beginning with target design, gene modifications can be achieved within as little as 1–2 weeks, and modified clonal cell lines can be derived within 2–3 weeks.
The Streptococcus pyogenes Cas9 (SpCas9) nuclease can be efficiently targeted to genomic loci by means of singleguide RNAs (sgRNAs) to enable genome editing1–10. Here, we characterize SpCas9 targeting specificity in human cells to inform the selection of target sites and avoid off-target effects. Our study evaluates >700 guide RNA variants and SpCas9-induced indel mutation levels at >100 predicted genomic off-target loci in 293T and 293FT cells. We find that SpCas9 tolerates mismatches between guide RNA and target DNA at different positions in a sequence-dependent manner, sensitive to the number, position and distribution of mismatches. We also show that SpCas9-mediated cleavage is unaffected by DNA methylation and that the dosage of SpCas9 and sgRNA can be titrated to minimize off-target modification. To facilitate mammalian genome engineering applications, we provide a web-based software tool to guide the selection and validation of target sequences as well as off-target analyses.
The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of heritability. To test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole genome sequencing in 2,657 Europeans with and without diabetes, and exome sequencing in a total of 12,940 subjects from five ancestral groups. To increase statistical power, we expanded sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support a major role for lower-frequency variants in predisposition to type 2 diabetes.
Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n=363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations, and additional sets of large-effect (> 0.1 sd) protein-altering, HLA, and copy-number variant associations. Through Mendelian Randomization analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores for each biomarker and built ‘multi-PRS’ models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout, and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n=135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers, their causal influences on diseases, and improve genetic risk stratification for common diseases.
IMPORTANCE Data sets linking comprehensive genomic profiling (CGP) to clinical outcomes may accelerate precision medicine.OBJECTIVE To assess whether a database that combines EHR-derived clinical data with CGP can identify and extend associations in non-small cell lung cancer (NSCLC).DESIGN, SETTING, AND PARTICIPANTS Clinical data from EHRs were linked with CGP results for 28 998 patients from 275 US oncology practices. Among 4064 patients with NSCLC, exploratory associations between tumor genomics and patient characteristics with clinical outcomes were conducted, with data obtained between January 1, 2011, and January 1, 2018.EXPOSURES Tumor CGP, including presence of a driver alteration (a pathogenic or likely pathogenic alteration in a gene shown to drive tumor growth); tumor mutation burden (TMB), defined as the number of mutations per megabase; and clinical characteristics gathered from EHRs. MAIN OUTCOMES AND MEASURESOverall survival (OS), time receiving therapy, maximal therapy response (as documented by the treating physician in the EHR), and clinical benefit rate (fraction of patients with stable disease, partial response, or complete response) to therapy. RESULTS Among 4064 patients with NSCLC (median age, 66.0 years; 51.9% female), 3183 (78.3%) had a history of smoking, 3153 (77.6%) had nonsquamous cancer, and 871 (21.4%) had an alteration in EGFR, ALK, or ROS1 (701 [17.2%] with EGFR, 128 [3.1%] with ALK, and 42 [1.0%] with ROS1 alterations). There were 1946 deaths in 7 years. For patients with a driver alteration, improved OS was observed among those treated with (n = 575) vs not treated with (n = 560) targeted therapies (median, 18.6 months [95% CI, 15.2-21.7] vs 11.4 months [95% CI, 9.7-12.5] from advanced diagnosis; P < .001). TMB (in mutations/Mb) was significantly higher among smokers vs nonsmokers (8.7 [IQR,] vs 2.6 [IQR, 1.7-5.2]; P < .001) and significantly lower among patients with vs without an alteration in EGFR (3.5 [IQR, 1.76-6.1] vs 7.8 [IQR, 3.5-13.9]; P < .001), ALK (2.1 [IQR, 0.9-4.0] vs 7.0 [IQR, 3.5-13.0]; P < .001), RET (4.6 [IQR,] vs 7.0 [IQR, 2.6-13.0]; P = .004), or ROS1 (4.0 [IQR, 1.2-9.6] vs 7.0 [IQR, 2.6-13.0]; P = .03). In patients treated with anti-PD-1/PD-L1 therapies (n = 1290, 31.7%), TMB of 20 or more was significantly associated with improved OS from therapy initiation (16.8 months [95% CI, 11.6-24.9] vs 8.5 months [95% CI, 7.6-9.7]; P < .001), longer time receiving therapy (7.8 months [95% CI, 5.5-11.1] vs 3.3 months [95% CI, 2.8-3.7]; P < .001), and increased clinical benefit rate (80.7% vs 56.7%; P < .001) vs TMB less than 20.CONCLUSIONS AND RELEVANCE Among patients with NSCLC included in a longitudinal database of clinical data linked to CGP results from routine care, exploratory analyses replicated previously described associations between clinical and genomic characteristics, between driver mutations and response to targeted therapy, and between TMB and response to immunotherapy. These findings demonstrate the feasibility of creating a clinicogenomic database der...
The genetic architecture of human diseases governs the success of genetic mapping and the future of personalized medicine. Although numerous studies have queried the genetic basis of common disease, contradictory hypotheses have been advocated about features of genetic architecture (e.g., the contribution of rare vs. common variants). We developed an integrated simulation framework, calibrated to empirical data, to enable systematic evaluation of such hypotheses. For type 2 diabetes (T2D), two simple parameters – (a) the target size for causal mutation and (b) the coupling between selection and phenotypic effect – define a broad space of architectures. While extreme models are excluded, many models remain consistent with epidemiology, linkage, and genome-wide association studies for T2D, including those where rare variants explain little (<25%) or most (>80%) of heritability. Ongoing sequencing and genotyping studies will further constrain architecture, but very large samples (e.g., >250K unselected individuals) will be required to localize most of the heritability underlying traits like T2D.
Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α=2.5×10-6) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.
Genome sequencing can identify individuals in the general population who harbor rare coding variants in genes for Mendelian disorders1–7 – and who consequently may have increased disease risk. However, previous studies of rare variants in phenotypically extreme individuals have ascertainment bias and may demonstrate inflated effect size estimates8–12. We sequenced seven genes for maturity-onset diabetes of the young (MODY)13 in well-phenotyped population samples14,15 (n=4,003). Rare variants were filtered according to prediction criteria used to identify disease-causing mutations: i) previously-reported in MODY, and ii) stringent de novo thresholds satisfied (rare, conserved, protein damaging). Approximately 1.5% and 0.5% of randomly selected Framingham and Jackson Heart Study individuals carried variants from these two classes, respectively. However, the vast majority of carriers remained euglycemic through middle age. Accurate estimates of variant effect sizes from population-based sequencing are needed to avoid falsely predicting a significant fraction of individuals as at risk for MODY or other Mendelian diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.