G-quadruplex (G4) structures formed by guanine-rich nucleic acids are implicated in essential physiological processes and serve as important drug targets. The genome-wide detection of G4s in living cells is important for exploring the biological role of G4s but has not yet been achieved due to the lack of a suitable G4 probe. We engineered a 6.7 kDa G4 probe (G4P) protein that binds G4s with high affinity and specificity. We used it to capture G4s in living human, mouse, and chicken cells with the ChIP-Seq technology, yielding genome-wide landscape as well as details on the positions, frequencies, and sequence identities of G4 formation in these cells. Our results indicate that transcription is accompanied by a robust formation of G4s in genes. In human cells, we detected up to >123,000 G4 peaks, of which >1/3 had a fold increase of ≥5 and were present in >60% promoters and ~70% genes. Being much smaller than a scFv antibody (27 kDa) or even a nanobody (12-15 kDa), we expect that the G4P may find diverse applications in biology, medicine, and molecular devices as a G4 affinity agent.3 proteins. Moreover, the synergy between two binding domains dramatically improves affinity and selectivity towards G4s. Expression of the G4P in cells followed by chromatin immunoprecipitation (termed G4P-ChIP) allowed us to capture G4s in living human, mouse, and chicken cells through the ChIP-Seq technology, revealing genome-wide landscape and details on the locations, frequencies, and sequence identities of G4 formation in these cells.
MATERIALS AND METHODSChIP-Seq data analysis. Clean paired-end sequencing data in fastq format were mapped to the human genome (hg19 or hg38 when comparing with downloaded hg38 data) using the Bowtie2 software (12) with the sensitive-local preset and --no-unal, --no-discordant, --no-mixed parameters. Mapped reads were written to bam files after being filtered by the samtools view (13) to remove low-quality alignments with the parameter -q 20 and by samtools rmdup to remove duplicates. Reads enrichment was calculated using the deeptools (14) plotEnrichment with the bam files. Reads bam files were also processed by the deeptools bamCompare to produce bigwig coverage files in subtract or ratio mode and normalized to RPKM. Profiles and heatmaps of reads were generated from the bigwig files using the deeptools computeMatrix followed by plotProfile and plotHeatmap, respectively, with region bed files derived from the NCBI RefSeq bed file downloaded from the UCSC website (http://genome.ucsc.edu/) unless otherwise indicated. Coordinate duplicates in the bed files were removed. Peaks of reads enrichment were identified with the macs2 software (15) using --qvalue 0.001, --keep-dup 1, and default values for the other parameters. ChIP-Seq data from public repositories were downloaded from the GEO (https://www.ncbi.nlm.nih.gov/geo/) or Encode (https://www.encodeproject.org/) database and processed as described above whenever applicable. Original sequencing (fastq) and processed (narrowPeak, bigwig) G4P-ChIP data...