There remains a large discrepancy between the known genetic contributions to cancer and that which can be explained by genomic variants, both inherited and somatic. Recently, understudied repetitive DNA regions called microsatellites have been identified as genetic risk markers for a number of diseases including various cancers (breast, ovarian and brain). In this study, we demonstrate an integrated process for identifying and further evaluating microsatellite-based risk markers for lung cancer using data from the cancer genome atlas and the 1000 genomes project. Comparing whole-exome germline sequencing data from 488 TCGA lung cancer samples to germline exome data from 390 control samples from the 1000 genomes project, we identified 119 potentially informative microsatellite loci. These loci were found to be able to distinguish between cancer and control samples with sensitivity and specificity ratios over 0.8. Then these loci, supplemented with additional loci from other cancers and controls, were evaluated using a target enrichment kit and sample-multiplexed nextgen sequencing. Thirteen of the 119 risk markers were found to be informative in a well powered study (>0.99 for a 0.95 confidence interval) using high-depth (579x±315) nextgen sequencing of 30 lung cancer and 89 control samples, resulting in sensitivity and specificity ratios of 0.90 and 0.94, respectively. When 8 loci harvested from the bioinformatic analysis of other cancers are added to the classifier, then the sensitivity and specificity rise to 0.93 and 0.97, respectively. Analysis of the genes harboring these loci revealed two genes (ARID1B and REL) and two significantly enriched pathways (chromatin organization and cellular stress response) suggesting that the process of lung carcinogenesis is linked to chromatin remodeling, inflammation, and tumor microenvironment restructuring. We illustrate that high-depth sequencing enables a high-precision microsatellite-based risk classifier analysis approach. This microsatellite-based platform confirms the potential to create clinically actionable diagnostics for lung cancer.
Motivation: The increasing availability of chromatin immunoprecipitation sequencing (ChIP-Seq) data enables us to learn more about the action of transcription factors in the regulation of gene expression. Even though in vivo transcriptional regulation often involves the concerted action of more than one transcription factor, the format of each individual ChIP-Seq dataset usually represents the action of a single transcription factor. Therefore, a relational database in which available ChIP-Seq datasets are curated is essential. Results: We present Expresso (database and webserver) as a tool for the collection and integration of available Arabidopsis ChIP-Seq peak data, which in turn can be linked to a user’s gene expression data. Known target genes of transcription factors were identified by motif analysis of publicly available GEO ChIP-Seq data sets. Expresso currently provides three services: 1) Identification of target genes of a given transcription factor; 2) Identification of transcription factors that regulate a gene of interest; 3) Computation of correlation between the gene expression of transcription factors and their target genes. Availability : Expresso is freely available at http://bioinformatics.cs.vt.edu/expresso/
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.