The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.
We report a genome-wide association study (GWAS) of cutaneous squamous cell carcinoma (SCC) conducted among non-Hispanic white (NHW) members of the Kaiser Permanente Northern California (KPNC) health care system. The study includes a genome-wide screen of 61,457 members (6,891 cases and 54,566 controls) genotyped on the Affymetrix Axiom European array and a replication phase involving an independent set of 6,410 additional members (810 cases and 5600 controls). Combined analysis of screening and replication phases identified ten loci containing single-nucleotide polymorphisms (SNPs) with P-values < 5×10-8. Six loci contain genes in the pigmentation pathway; SNPs at these loci appear to modulate SCC risk independently of the pigmentation phenotypes. Another locus contains HLA class II genes studied in relation to elevated SCC risk following immunosuppression. SNPs at the remaining three loci include an intronic SNP in FOXP1 at locus 3p13, an intergenic SNP at 3q28 near TP63, and an intergenic SNP at 9p22 near BNC2. These findings provide insights into the genetic factors accounting for inherited SCC susceptibility.
Purpose: Limb Girdle Muscular Dystrophies (LGMD) are a genetically heterogeneous category of autosomal inherited muscle diseases. Many genes causing LGMD have been identified, and clinical trials are beginning for treatment of some genetic subtypes. However, even with the gene-level mechanisms known, it is still difficult to get a reliable and generalizable prevalence estimation for each subtype due to the limited amount of epidemiology data and the low incidence of LGMDs. Methods: Taking advantage of recently published whole exome and genome sequencing data from the general population, we used a Bayesian method to develop a reliable disease prevalence estimator. Results: This method was applied to nine recessive LGMD subtypes. The estimated disease prevalence calculated by this method were largely comparable to published estimates from epidemiological studies, however highlighted instances of possible under-diagnosis for LGMD2B and 2L. Conclusion: The increasing size of aggregated population variant databases will allow for robust and reproducible prevalence estimates of recessive disease, which is critical for the strategic design and prioritization of clinical trials..
Supplementary data are available at Bioinformatics online.
Cutaneous squamous cell carcinoma (cSCC) is the second most common cancer among Caucasians in the United States, with rising incidence over the past decade. Treatment for non-melanoma skin cancer, including cSCC, in the United States was estimated to cost $4.8 billion in 2014. Thus, an understanding of cSCC pathogenesis could have important public health implications. Immune function impacts cSCC risk, given that cSCC incidence rates are substantially higher in patients with compromised immune systems. We report a systematic review of published associations between cSCC risk and the human leukocyte antigen (HLA) system. This review includes studies that analyze germline class I and class II HLA allelic variation as well as HLA cell-surface protein expression levels associated with cSCC risk. We propose biological mechanisms for these HLA-cSCC associations based on known mechanisms of HLA involvement in other diseases. The review suggests that immunity regulates the development of cSCC and that HLA-cSCC associations differ between immunocompetent and immunosuppressed patients. This difference may reflect the presence of viral co-factors that affect tumorigenesis in immunosuppressed patients. Finally, we highlight limitations in the literature on HLA-cSCC associations, and suggest directions for future research aimed at understanding, preventing and treating cSCC.
Age is the primary risk factor for many common human diseases. Here, we quantify the relative contributions of genetics and aging to gene expression patterns across 27 tissues from 948 humans. We show that the predictive power of expression quantitative trait loci is impacted by age in many tissues. Jointly modelling the contributions of age and genetics to transcript level variation we find expression heritability (h2) is consistent among tissues while the contribution of aging varies by >20-fold with $${R}_{{{{{{{{\rm{age}}}}}}}}}^{2} \; > \;{h}^{2}$$ R age 2 > h 2 in 5 tissues. We find that while the force of purifying selection is stronger on genes expressed early versus late in life (Medawar’s hypothesis), several highly proliferative tissues exhibit the opposite pattern. These non-Medawarian tissues exhibit high rates of cancer and age-of-expression-associated somatic mutations. In contrast, genes under genetic control are under relaxed constraint. Together, we demonstrate the distinct roles of aging and genetics on expression phenotypes.
Cutaneous squamous cell cancers (cSCCs) present an under-recognized health issue among non-Hispanic whites, one that is likely to increase as populations age. cSCC risks vary considerably among non-Hispanic whites, and this heterogeneity indicates the need for risk-stratified screening strategies that are guided by patients' personal characteristics and clinical histories. Here we describe cSCCscore, a prediction tool that uses patients' covariates and clinical histories to assign them personal probabilities of developing cSCCs within 3 years after risk assessment. cSCCscore uses a statistical model for the occurrence and timing of a patient's cSCCs, whose parameters we estimated using cohort data from 66,995 patients in the Kaiser Permanente Northern California healthcare system. We found that patients' covariates and histories explained approximately 75% of their interpersonal cSCC risk variation. Using cross-validated performance measures, we also found cSCCscore's predictions to be moderately well calibrated to the patients' observed cSCC incidence. Moreover, cSCCscore discriminated well between patients who subsequently did and did not develop a new primary cSCC within 3 years after risk assignment, with area under the receiver operating characteristic curve of approximately 85%. Thus, cSCCscore can facilitate more informed management of non-Hispanic white patients at cSCC risk. cSCCscore's predictions are available at https://researchapps.github.io/cSCCscore/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.