2019
DOI: 10.1101/780015
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Illuminating the dark side of the human transcriptome with TAMA Iso-Seq analysis

Abstract: The human transcriptome is one of the most well-annotated of the eukaryotic species. However, limitations in technology biased discovery toward protein coding spliced genes. Accurate high throughput long read RNA sequencing now has the potential to investigate genes that were previously undetectable. Using our Transcriptome Annotation by Modular Algorithms (TAMA) tool kit to analyze the Pacific Bioscience Universal Human Reference RNA Sequel II Iso-Seq dataset, we discovered thousands of potential novel genes … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(21 citation statements)
references
References 36 publications
(31 reference statements)
1
20
0
Order By: Relevance
“…All Iso-Seq data was first processed using software IsoSeq v3.1 to obtain full-length non-concatemer reads with at least 3 full sequencing passes, which were then mapped to the sheep reference genome GCA_002742125.1 using GMAP version 2018-05-30. TAMA Collapse from the TAMA tool kit 83 was used to generate unique gene and transcript models, which were further merged with RNAseq-based annotation data using TAMA Merge to incorporate any transcript models that were identified by RNAseq but not Iso-Seq. Functional annotation of transcripts was carried out using Trinotate (v3.1.1).…”
Section: Discussionmentioning
confidence: 99%
“…All Iso-Seq data was first processed using software IsoSeq v3.1 to obtain full-length non-concatemer reads with at least 3 full sequencing passes, which were then mapped to the sheep reference genome GCA_002742125.1 using GMAP version 2018-05-30. TAMA Collapse from the TAMA tool kit 83 was used to generate unique gene and transcript models, which were further merged with RNAseq-based annotation data using TAMA Merge to incorporate any transcript models that were identified by RNAseq but not Iso-Seq. Functional annotation of transcripts was carried out using Trinotate (v3.1.1).…”
Section: Discussionmentioning
confidence: 99%
“…RNA-seq reads were stringently mapped using HISAT2 (v.2.0.0) 79 . Transcriptomic data were processed using TAMA 80 . All transcriptomic, homology-based and ab initio evidence were integrated into a consensus gene annotation using EVidenceModeller (v.1.1.1) 81 .…”
Section: Gene Annotationmentioning
confidence: 99%
“…Transcripts were assembled with settings -f 0.05 as the threshold for isoforms expression and the gtf-files were merged with stringtie --merge. Assembled transcripts were processed with TAMA tools [47] for ORF detection and BLAST parsing to identify coding regions. We used curated proteins from Uniprot_Swissprot together with proteins from the latest ENSEMBL dog annotation (v100) and selected the longest blast hit from the top 5 hits with an E-value below 10^-10 as the id of the protein.…”
Section: Gene Annotationmentioning
confidence: 99%