Transcriptomics analysis of various small RNA (sRNA) biotypes is a new and rapidly developing field. Annotations for microRNAs, tRNAs, piRNAs and rRNAs contain information on transcript sequences and loci that is vital for downstream analyses. Several databases have been established to provide this type of data for specific RNA biotypes. However, these sources often contain data in different formats, which makes the bulk analysis of several sRNA biotypes in a single pipeline challenging. Information on some transcripts may be incomplete or conflicting with other entries. To overcome these challenges, we introduce ITAS, or Integrated Transcript Annotation for Small RNA, a filtered, corrected and integrated transcript annotation containing information on several types of small RNAs, including tsRNAs, for several species (Homo sapiens, Rattus norvegicus, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans). ITAS is presented in a format applicable for the vast majority of bioinformatic transcriptomics analysis, and it was tested in several case studies for human-derived data against existing alternative databases.
Expression analysis of small noncoding RNA (sRNA), including microRNA, piwi-interacting RNA, small rRNA-derived RNA, and tRNA-derived small RNA, is a novel and quickly developing field. Despite a range of proposed approaches, selecting and adapting a particular pipeline for transcriptomic analysis of sRNA remains a challenge. This paper focuses on the identification of the optimal pipeline configurations for each step of human sRNA analysis, including reads trimming, filtering, mapping, transcript abundance quantification and differential expression analysis. Based on our study, we suggest the following parameters for the analysis of human sRNA in relation to categorical analyses with two groups of biosamples: (1) trimming with the lower length bound = 15 and the upper length bound = Read length − 40% Adapter length; (2) mapping on a reference genome with bowtie aligner with one mismatch allowed (-v 1 parameter); (3) filtering by mean threshold > 5; (4) analyzing differential expression with DESeq2 with adjusted p-value < 0.05 or limma with p-value < 0.05 if there is very little signal and few transcripts.
Analysis of the expression activity of small noncoding RNA (sRNA), including microRNA, piwi-interacting RNA, small rRNA-derived RNA, and tRNA-derived small RNA, is a novel and quickly developing field. The nature of sRNA calls for bioinformatical approaches, adapted for the specific structure of the data. Despite a number of approaches proposed, selecting and adapting a particular pipeline for transcriptomic analysis of sRNA remains a challenge. This article is dedicated to identifying the optimal pipeline configurations for each step of sRNA analysis, including reads trimming, filtering, mapping, transcript abundance quantification and differential expression analysis. For categorical factors and two groups of biosamples, we suggest approaches for the most crucial stages of sRNA analysis pipeline such as: (1) trimming with the lower length bound = 15 and the upper length bound = Readlength−40%Adapterlength; (2) mapping on a reference genome with bowtie aligner with one mismatch allowed (-v 1 parameter); (3) filtering by mean threshold &gt; 5; and (4) analyzing differential expression with DESeq2 with adjusted p-value &lt; 0.05 or limma with p-value &lt; 0.05 if there is very little signal and few transcripts.
Expression analysis of small noncoding RNA (sRNA), including microRNA, piwi-interacting RNA, small rRNA-derived RNA, and tRNA-derived small RNA, is a novel and quickly developing field. Despite a range of proposed approaches, selecting and adapting a particular pipeline for transcriptomic analysis of sRNA remains a challenge. This paper focuses on the identification of the optimal pipeline configurations for each step of human sRNA analysis, including reads trimming, filtering, mapping, transcript abundance quantification and differential expression analysis. Based on our study, we suggest the following parameters for analysis of human sRNA in relation to categorical analyses with two groups of biosamples: (1) trimming with the lower length bound = 15 and the upper length bound = \(Read\ length - 40\% Adapter\ length\); (2) mapping on a reference genome with bowtie aligner with one mismatch allowed (-v 1 parameter); (3) filtering by mean threshold > 5; and (4) analyzing differential expression with DESeq2 with adjusted p-value < 0.05 or limma with p-value < 0.05 if there is very little signal and few transcripts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.