2017
DOI: 10.1186/s12859-017-1942-z
|View full text |Cite
|
Sign up to set email alerts
|

A modified GC-specific MAKER gene annotation method reveals improved and novel gene predictions of high and low GC content in Oryza sativa

Abstract: BackgroundAccurate structural annotation depends on well-trained gene prediction programs. Training data for gene prediction programs are often chosen randomly from a subset of high-quality genes that ideally represent the variation found within a genome. One aspect of gene variation is GC content, which differs across species and is bimodal in grass genomes. When gene prediction programs are trained on a subset of grass genes with random GC content, they are effectively being trained on two classes of genes a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
27
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 23 publications
(27 citation statements)
references
References 48 publications
0
27
0
Order By: Relevance
“…Homology-based evidence, included 7097 ESTs (downloaded from NCBI EST database on February 9, 2017), protein sequences from Uniprot 51 , a date palm proteome [http://qatar-weill.cornell.edu/research/research-highlights/date-palm-research-program/date-palm-genome-data], an oil palm proteome 52 , and the RNA-Seq derived models from above. Ab initio prediction was performed with Augustus (v. 3.0) trained as described in Bowman et al 53 with gene models produced with StringTie 48 (v. 1.3.2), from the RNA-Seq alignments.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Homology-based evidence, included 7097 ESTs (downloaded from NCBI EST database on February 9, 2017), protein sequences from Uniprot 51 , a date palm proteome [http://qatar-weill.cornell.edu/research/research-highlights/date-palm-research-program/date-palm-genome-data], an oil palm proteome 52 , and the RNA-Seq derived models from above. Ab initio prediction was performed with Augustus (v. 3.0) trained as described in Bowman et al 53 with gene models produced with StringTie 48 (v. 1.3.2), from the RNA-Seq alignments.…”
Section: Methodsmentioning
confidence: 99%
“…The raw MAKER2 annotation was parsed, removing models containing TE domains and lacking evidence of transcription or the presence of a Pfam domain as described in Bowman et al 53 . With about 1× of non-organellar single-end WGS Illumina reads, a de novo (non assembly-based) repeat library was produced with RepeatExplorer 54 , and parsed as in Copetti et al 55 .…”
Section: Methodsmentioning
confidence: 99%
“…The EST2genome function in MAKER v2.31 35 was used identify putative bow n genes based on BLAST 114 and Exonerate 115 transcript alignments. Best scoring genes with an Annotation Edit Distance of 0.2 or less were used to train Hidden Markov Models with SNAP 116 and AUGUSTUS 117 as described 118 . Ensembl and RefSeq protein sequences from other vertebrate species were used as additional evidence: gar (LepOcu1), coelacanth (LatCha1), mouse (GRCm38.p5), chicken (Gallus_gallus-5.0), human (GRCh38.p10), Xenopus (JGI 4.2), anole lizard (AnoCar2.0), zebra sh (GRCz10), medaka (HdrR), arowana (GCA_001624265.1 ASM162426v1), and elephant shark (GCA_000165045.2 Callorhinchus_milii-6.1.3).…”
Section: Methodsmentioning
confidence: 99%
“…Of the 22,960 gene models generated in the initial MAKER run, 4,224 unique gene models with less than 0.2 of annotation edit distance (AED; Eilbeck et al 2009) were used to train the Hidden Markov model (HMM) for SNAP (ver. 2013-02-16; Korf 2004) and AUGUSTUS (Stanke et al 2004) following Campbell et al (2014) and Bowman et al (2017). MAKER was run again to conduct SNAP- and AUGUSTUS-based gene prediction with the trained HMMs.…”
Section: Methodsmentioning
confidence: 99%