2018
DOI: 10.1093/gigascience/giy093
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging multiple transcriptome assembly methods for improved gene structure annotation

Abstract: BackgroundThe performance of RNA sequencing (RNA-seq) aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand.ResultsHere, we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-sp… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
123
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 155 publications
(123 citation statements)
references
References 43 publications
(55 reference statements)
0
123
0
Order By: Relevance
“…We downloaded coding sequences from the C. hirsuta genomic resources web site http://chi.mpipz.mpg.de/download/annotations/carhr38.cds.fa and mapped to C. amara using gmap. The resulting sam file was converted to bam-format, sorted and indexed via samtools (v. 1.7) 58 , and then converted to GTF-format via the 'convert' script in Mikado (v1.2.3) 67 which was subsequently used to build a snpEFF (v. 4.3) 68 database. We overlapped the candidate minimum rank sum SNPs with candidate SNPs from fineMAV analysis and annotated each SNP identified by both methods with gene to which it belongs.…”
Section: Window-based Selection Scan Using a Quartet Designmentioning
confidence: 99%
“…We downloaded coding sequences from the C. hirsuta genomic resources web site http://chi.mpipz.mpg.de/download/annotations/carhr38.cds.fa and mapped to C. amara using gmap. The resulting sam file was converted to bam-format, sorted and indexed via samtools (v. 1.7) 58 , and then converted to GTF-format via the 'convert' script in Mikado (v1.2.3) 67 which was subsequently used to build a snpEFF (v. 4.3) 68 database. We overlapped the candidate minimum rank sum SNPs with candidate SNPs from fineMAV analysis and annotated each SNP identified by both methods with gene to which it belongs.…”
Section: Window-based Selection Scan Using a Quartet Designmentioning
confidence: 99%
“…Chinese Spring genome sequence was annotated as described in [25]. Briefly, two gene prediction pipelines were used (TriAnnot: developed at GDEC Institute [INRA-UCA Clermont-Ferrand]; the pipeline developed at Helmholtz Center Munich [PGSB]) and the two annotations were integrated (pipeline established at Earlham Institute [43]) to achieve a single high-quality gene set. TE modeling was achieved through a similarity search approach based on the ClariTeRep curated databank of repeated elements [44], developed specifically for the wheat genome, and with the CLARITE program that was developed to model TEs and reconstruct their nested structure [17].…”
Section: Te Modeling Using Claritementioning
confidence: 99%
“…com/urmi-21/pyrpipe/tree/master/case_ studies/Athaliana_transcript_assembly E. Integrating pyrpipe scripts within a workflow management system. We embedded pyrpipe into the Snakemake workflow management system (6), and used it to download human RNA-Seq data with SRAtools, quality filter the data with BBDuk (15), align reads with Hisat2 (12), assemble transcripts with StringTie (13) and Cufflinks (17), and merge the multiple assemblies with Mikado (18). Case study: https://github.com/urmi-21/ pyrpipe/tree/master/case_studies/Human_ annotation_snakemake F. Prediction of Zea mays orphan genes.…”
Section: Case Studies C Prediction Of Long Non-coding Rnas (Lncrnas)mentioning
confidence: 99%