Accumulating evidence from small-scale studies has suggested that allele-specific expression (ASE) plays an important role in tumor initiation and progression. However, little is known about genome-wide ASE in tumors. In this study, we conducted a comprehensive analysis of ASE in individuals with colorectal cancer (CRC) on a genome-wide scale. We identified 5.4 thousand genome-wide ASEs of single nucleotide variations (SNVs) from tumor and normal tissues of 59 individuals with CRC. We observed an increased ASE level in tumor samples and the ASEs enriched as hotspots on the genome. Around 63% of the genes located there were previously reported to contain complex regulatory elements, e.g., human leukocyte antigen (HLA), or were implicated in tumor progression. Focussing on the allelic expression of somatic mutations, we found that 37.5% of them exhibited ASE, and genes harboring such somatic mutations, were enriched in important pathways implicated in cancers. In addition, by comparing the expected and observed ASE events in tumor samples, we identified 50 tumor specific ASEs which possibly contributed to the somatic events in the regulatory regions of the genes and significantly enriched known cancer driver genes. By analyzing CRC ASEs from several perspectives, we provided a systematic understanding of how ASE is implicated in both tumor and normal tissues and will be of critical value in guiding ASE studies in cancer.
Long noncoding RNAs (lncRNAs) are proposed to play essential roles in regulating gene transcription. Moreover, a subset has been implicated in modulating the expression of the nearby loci. Here we systematically evaluated the relationship between lncRNAs and their neighboring genes based on transcriptome expression profiles from 4900 samples across 12 cancer types. Our findings reveal that lncRNAs, especially those of high syntenic conservation across species, are spatially correlated with transcription factors across the genome. Combining the methods of conservation, co-expression, and causal inference test, we identified a list of 28 lncRNA/TF regulatory pairs across 12 TCGA cancer types, and 19 of which were further confirmed in additional cancer cell lines. Several of these pairs, including PTV1/MYC and GATA6-AS1/GATA6, show prior evidence of regulatory relationships. Other candidates such as LINC00261/FOXA2 and PITRM1-AS1/KLF6 were novel. Our study highlights the significant roles of lncRNAs in tumorigenesis and provides a comprehensive overview of lncRNA regulation on its neighboring TF genes in human cancers.
Allele-specific expression (ASE) studies have wide-ranging implications for genome biology and medicine. Whole transcriptome RNA sequencing (RNA-Seq) has emerged as a genome-wide tool for identifying ASE, but suffers from mapping bias favoring reference alleles. Two categories of methods are adopted nowadays, to reduce the effect of mapping bias on ASE identification-normalizing RNA allelic ratio with the parallel genomic allelic ratio (pDNAar) and modifying reference genome to make reads carrying both alleles with the same chance to be mapped (mREF). We compared the sensitivity and specificity of both methods with simulated data, and demonstrated that the pDNAar, though ideally practical, was lower in sensitivity, because of its lower mapping rate of reads carrying nonreference (alternative) alleles, although mREF achieved higher sensitivity and specificity for its efficiency in mapping reads carrying both alleles. Application of these two methods in real sequencing data showed that mREF were able to identify more ASE loci because of its higher mapping efficiency, and able to correcting some seemly incorrect ASE loci identified by pDNAar due to the inefficiency in mapping reads carrying alternative alleles of pDNAar. Our study provides useful information for RNA sequencing data processing in the identification of ASE.
This article contains data related to the research article entitled “Systematic analysis reveals long noncoding RNAs regulating neighboring transcription factors in human cancers” (Liu et al., 2018 in press) [1]. Long noncoding RNAs (lncRNAs) are proposed to play essential roles in modulating the expression of the nearby loci. In this study, we systematically investigated the relationship between lncRNAs and their neighboring genes based on the genomic location of genes and the transcriptome expression profiles from TCGA samples across 12 tumor types. Position conservation analysis was applied to find lncRNAs conserved by position across vertebrate species. Gene ontology and enrichment analysis identified TF genes as a specific type of protein-coding genes that adjacent to highly positionally conserved lncRNA. The expression correlation of lncRNAs and their adjacent TFs were assessed across tumors to define significant co-expressed lncRNA-TF pairs, and a causal inference test (CIT) was used to infer the causal regulation of lncRNA on its nearby TF genes. A list of candidate lncRNA/TF regulation pairs in tumors was provided.
cisASE is freely available at http://lifecenter.sgst.cn/cisASE CONTACT: biosinodx@gmail.com or yxli@sibs.ac.cnSupplementary information: Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.