Alternative polyadenylation is an essential RNA processing event that contributes significantly to regulation of transcriptome diversity and functional dynamics in both animals and plants. Here we review newly developed next generation sequencing methods for genome-wide profiling of alternative polyadenylation (APA) sites, bioinformatics pipelines for data processing and both wet and dry laboratory approaches for APA validation. The library construction methods LITE-Seq (Low-Input 3'-Terminal sequencing) and PAC-seq (PolyA Click sequencing) tag polyA+ cDNA, while BAT-seq (BArcoded, three-prime specific sequencing) and PAPERCLIP (Poly(A) binding Protein-mediated mRNA 3′End Retrieval by CrossLinking ImmunoPrecipitation) enrich polyA+ RNA. Interestingly, only WTTS-seq (Whole Transcriptome Termini Site sequencing) targets both polyA+ RNA and polyA+ cDNA. Varieties of bioinformatics pipelines are well established to pursue read quality control, mapping, clustering, characterization and pathway analysis. The RHAPA (RNase H alternative polyadenylation assay) and 3'RACE-seq (3' rapid amplification of cDNA end sequencing) methods directly validate APA sites, while WTSS-seq (whole transcriptome start site sequencing), RNA-seq (RNA sequencing) and public APA databases can serve as indirect validation methods. We hope that these tools, pipelines and resources trigger huge waves of interest in the research community to investigate APA events underlying physiological, pathological and psychological changes and thus understand the information transfer events from genome to phenome relevant to economically important traits in both animals and plants.
Functional annotation of the bovine genome was performed by characterizing the spectrum of RNA transcription using a multi-omics approach, combining long- and short-read transcript sequencing and orthogonal data to identify promoters and enhancers and to determine boundaries of open chromatin. A total number of 171,985 unique transcripts (50% protein-coding) representing 35,150 unique genes (64% protein-coding) were identified across tissues. Among them, 159,033 transcripts (92% of the total) were structurally validated by independent datasets such as PacBio Iso-seq, ONT-seq, de novo assembled transcripts from RNA-seq, or Ensembl and NCBI gene sets. In addition, all transcripts were supported by extensive independent data from different technologies such as WTTS-seq, RAMPAGE, ChIP-seq, and ATAC-seq. A large proportion of identified transcripts (69%) were novel, of which 87% were produced by known genes and 13% by novel genes. A median of two 5' untranslated regions was detected per gene, an increase from Ensembl and NCBI annotations (single). Around 50% of protein-coding genes in each tissue were bifunctional and transcribed both coding and noncoding isoforms. Furthermore, we identified 3,744 genes that functioned as non-coding genes in fetal tissues, but as protein-coding genes in adult tissues. Our new bovine genome annotation extended more than 11,000 known gene borders compared to Ensembl or NCBI annotations. The resulting bovine transcriptome was integrated with publicly available QTL data to study tissue-tissue interconnection involved in different traits and construct the first bovine trait similarity network. These validated results show significant improvement over current bovine genome annotations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.