R-loops are three-stranded nucleic acid structures formed upon annealing of an RNA strand to one strand of duplex DNA. We profiled R-loops using a high-resolution, strand-specific methodology in human and mouse cell types. R-loops are prevalent, collectively occupying up to 5% of mammalian genomes. R-loop formation occurs over conserved genic hotspots such as promoter and terminator regions of poly(A)-dependent genes. In most cases, R-loops occur co-transcriptionally and undergo dynamic turnover. Detailed epigenomic profiling revealed that R-loops associate with specific chromatin signatures. At promoters, R-loops associate with a hyper-accessible state characteristic of unmethylated CpG island promoters. By contrast, terminal R-loops associate with an enhancer- and insulator-like state and define a broad class of transcription terminators. Altogether, this suggests that the retention of nascent RNA transcripts at their site of expression represents an abundant, dynamic, and programmed component of the mammalian chromatin that impacts chromatin patterning and the control of gene expression.
An increasing amount of studies integrate mRNA sequencing data into MS-based proteomics to complement the translation product search space. However, several factors, including extensive regulation of mRNA translation and the need for three- or six-frame-translation, impede the use of mRNA-seq data for the construction of a protein sequence search database. With that in mind, we developed the PROTEOFORMER tool that automatically processes data of the recently developed ribosome profiling method (sequencing of ribosome-protected mRNA fragments), resulting in genome-wide visualization of ribosome occupancy. Our tool also includes a translation initiation site calling algorithm allowing the delineation of the open reading frames (ORFs) of all translation products. A complete protein synthesis-based sequence database can thus be compiled for mass spectrometry-based identification. This approach increases the overall protein identification rates with 3% and 11% (improved and new identifications) for human and mouse, respectively, and enables proteome-wide detection of 5′-extended proteoforms, upstream ORF translation and near-cognate translation start sites. The PROTEOFORMER tool is available as a stand-alone pipeline and has been implemented in the galaxy framework for ease of use.
Next-generation transcriptome sequencing is increasingly integrated with mass spectrometry to enhance MS-based protein and peptide identification. Recently, a breakthrough in transcriptome analysis was achieved with the development of ribosome profiling (ribo-seq). This technology is based on the deep sequencing of ribosome-protected mRNA fragments, thereby enabling the direct observation of in vivo protein synthesis at the transcript level. In order to explore the impact of a ribo-seq-derived protein sequence search space on MS/MS spectrum identification, we performed a comprehensive proteome study on a human cancer cell line, using both shotgun and N-terminal proteomics, next to ribosome profiling, which was used to delineate (alternative) translational reading-frames. By including protein-level evidence of sample-specific genetic variation and alternative translation, this strategy improved the identification score of 69 proteins and identified 22 new proteins in the shotgun experiment. Furthermore, we discovered 18 new alternative translation start sites in the N-terminal proteomics data and observed a correlation between the quantitative measures of ribo-seq and shotgun proteomics with a Pearson correlation coefficient ranging from 0.483 to 0.664. Overall, this study demonstrated the benefits of ribosome profiling for MS-based protein and peptide identification and we believe this approach could develop into a common practice for next-generation proteomics.
Genomic imprinting plays an important role in growth and development. Loss of imprinting (LOI) has been found in cancer, yet systematic studies are impeded by data-analytical challenges. We developed a methodology to detect monoallelically expressed loci without requiring genotyping data, and applied it on The Cancer Genome Atlas (TCGA, discovery) and Genotype-Tissue expression project (GTEx, validation) breast tissue RNA-seq data. Here, we report the identification of 30 putatively imprinted genes in breast. In breast cancer (TCGA), HM13 is featured by LOI and expression upregulation, which is linked to DNA demethylation. Other imprinted genes typically demonstrate lower expression in cancer, often associated with copy number variation and aberrant DNA methylation. Downregulation in cancer frequently leads to higher relative expression of the (imperfectly) silenced allele, yet this is not considered canonical LOI given the lack of (absolute) re-expression. In summary, our novel methodology highlights the massive deregulation of imprinting in breast cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.