2015
DOI: 10.1093/bioinformatics/btv422
|View full text |Cite
|
Sign up to set email alerts
|

TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets

Abstract: Supplementary data are available at Bioinformatics online.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
451
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 451 publications
(455 citation statements)
references
References 58 publications
4
451
0
Order By: Relevance
“…Gene expression levels were quantified as read counts mapped to NCBI RefSeq gene annotations (Pruitt et al, 2012). TE expression levels—for Alu, L1 and SVA elements—were quantified using reads mapped to RepeatMasker annotations, which were subsequently analyzed with the TEtranscripts package (Jin et al, 2015). The TEtranscripts program uses an expectation maximization (EM) algorithm to choose optimal unique TE locations for multi-mapped reads, thereby allowing for accurate expression level measurements for active TE families.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Gene expression levels were quantified as read counts mapped to NCBI RefSeq gene annotations (Pruitt et al, 2012). TE expression levels—for Alu, L1 and SVA elements—were quantified using reads mapped to RepeatMasker annotations, which were subsequently analyzed with the TEtranscripts package (Jin et al, 2015). The TEtranscripts program uses an expectation maximization (EM) algorithm to choose optimal unique TE locations for multi-mapped reads, thereby allowing for accurate expression level measurements for active TE families.…”
Section: Methodsmentioning
confidence: 99%
“…The Cancer Genome Atlas (TCGA) provides access to both transcriptome sequence data (RNA-seq) and whole genome sequence data (DNA-seq) for a number of matched normal and primary tumor sample pairs from individual patients (Weinstein et al, 2013). In addition, recently developed bioinformatics algorithms allow for the detection of TE transcripts directly from RNA-seq data (Jin et al, 2015) as well as for the characterization of novel TE insertions from DNA-seq data (Thung et al, 2014; Sudmant et al, 2015). We took advantage of these developments in order to evaluate the patterns of both TE expression and insertional activity in three cancer types: breast invasive carcinoma, head, and neck squamous cell carcinoma, and lung adenocarcinoma (Figure 1 and Supplementary Figure 1).…”
Section: Introductionmentioning
confidence: 99%
“…6 We compared the performance in terms of running time and quantification accuracy between our proposed pipeline and other tools, including TEtranscripts, HTSeq-count, Cuffdiff and RepEnrich. 14,[32][33][34] In the second dataset, we seek to identify new TEs that are associated with Amyotrophic Lateral Sclerosis (ALS). We applied our pipeline to a K562 cell-line RNA-seq dataset from ENCODE (Encyclopedia of DNA Elements, http://encodeproject.org) Consortium (accession ID: ENCBS555BYH).…”
Section: Datasetsmentioning
confidence: 99%
“…[6][7][8][9] Toward this end, several algorithms and pipelines were proposed to analyze reads files from TE studies. [10][11][12][13][14][15][16] However, most of the tools share some common limitations: 1) discordant read mapping due to increased chance of multiple mapping in repetitive elements from TEs in the same clade, 2) limited scalability for large-scale analysis, and 3) small coverage for the entire TEs defined in the human genome, i.e., a tool used in [16] only considered LINE 1 (Long Interspersed Nuclear Element 1) elements. 17 Among the existing tools, TEtranscripts has performed well on various datasets.…”
Section: Introductionmentioning
confidence: 99%
“…We limited our list of potential TEs to those included in TEtranscripts (Jin et al 2015) and RepEnrich 438 (Criscione et al 2014) to enable comparisons between these different programs. Using the selected TE 439 coordinates we generated a BED file using Clean and obtained Fasta sequences using Seek.…”
mentioning
confidence: 99%