R-loops are a prevalent class of non-B DNA structures that form during transcription upon reannealing of the nascent RNA to the template DNA strand. R-loops have been profiled using the S9.6 antibody to immunoprecipitate DNA:RNA hybrids. S9.6-based DNA:RNA immunoprecipitation (DRIP) techniques revealed that R-loops form dynamically over conserved genic hotspots. We developed an orthogonal profiling methodology that queries R-loops via the presence of long stretches of single-stranded DNA on the looped-out strand. Non-denaturing sodium bisulfite treatment catalyzes the conversion of unpaired cytosines to uracils, creating permanent genetic tags for the position of an R-loop. Long read, single-molecule PacBio sequencing allows the identification of R-loop 'footprints' at near nucleotide resolution in a strand-specific manner on single DNA molecules and at ultra-deep coverage. Single-molecule R-loop footprinting (SMRF-seq) revealed a strong agreement between S9.6-and bisulfite-based R-loop mapping and confirmed that R-loops form from unspliced transcripts over genic hotspots. Using the largest single-molecule R-loop dataset to date, we show that individual R-loops generate overlapping sets of molecular clusters that pile-up through larger R-loop-prone zones. SMRF-seq further established that R-loop distribution patterns are driven by both intrinsic DNA sequence features and DNA topological constraints, revealing the principles of R-loop formation.
KEYWORDSR-loops, DNA:RNA immunoprecipitation, SMRT sequencing, non-denaturing bisulfite conversion, DNA topology, S9.6 antibodydata analysis steps. Here we present a novel adaption of R-loop footprinting that permits singlemolecule R-loop detection at near nucleotide resolution in a strand-specific manner on long amplicons and at ultra-high coverage. This method and its accompanying computational analysis and data visualization pipeline enables deep, cost-effective, R-loop profiling at a range of genomic loci, under any condition and in any genome.
SMRTbell library constructionWe used the PacBio RSII system to achieve long-read, single-molecule resolution sequencing of R-loop footprints. We generated libraries by pooling non-overlapping amplicons (less than 20 products per run) adding equal amounts for each. Starting with 1-2 µg of PCR products, pooled samples were concentrated using 1X Ampure bead wash. Libraries were built following the "Procedure & Checklist -2 kb Template Preparation and Sequencing" protocol (PN 001-143-835-08) from PacBio with a few modifications. No prior DNA damage repair step was done. AMPure bead wash steps were done using 0.8X concentration. Ligation was done for 1 hour at 25°C. SMRTbell libraries were quantified and size confirmation done by either by gel electrophoresis or Agilent Genomic's 2100 Bioanalyzer. Libraries were sequenced on a PacBio RSII instrument with 6-hour movie times.
R-loop Profiling using DRIPc-seq.DRIPc-seq was applied to NTERA-2 cells as described [3] except a Ribonuclease A pretreatment was applied to the extracted nucleic...