2015
DOI: 10.7287/peerj.preprints.1296
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simultaneous gene finding in multiple genomes

Abstract: As whole genome sequencing is taking on ever-increasing dimensions, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the dif… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(19 citation statements)
references
References 8 publications
0
19
0
Order By: Relevance
“…proteins. The accuracy of a gene finding algorithm, which utilizes cross-species proteins mapping, depends strongly on the evolutionary distance between the species [10,26,45].…”
Section: Size Of the Protein Database And Distribution Of Evolutionarmentioning
confidence: 99%
See 1 more Smart Citation
“…proteins. The accuracy of a gene finding algorithm, which utilizes cross-species proteins mapping, depends strongly on the evolutionary distance between the species [10,26,45].…”
Section: Size Of the Protein Database And Distribution Of Evolutionarmentioning
confidence: 99%
“…On a parallel avenue, yet another algorithm, AUGUSTUS [9][10][11][12][13][14], was demonstrated to be one of the most accurate gene prediction tools [15][16][17]. AUGUSTUS carried a flexible mechanism for integration of external evidence generated by spliced-aligned RNA-Seq reads or homologous proteins into gene prediction.…”
Section: Introductionmentioning
confidence: 99%
“…We used BUSCO with the plant dataset (embryophyta_odb9). For gene prediction BUSCO uses Augustus (Version 3.3) (Stanke et al 2004;König et al 2016). For the gene finding 7 parameters in Augustus we set species to wheat and ran BUSCO in the genome mode (-m geno -sp wheat).…”
Section: Data Validation and Quality Controlmentioning
confidence: 99%
“…Based on input parameters, CAT will run AUGUSTUS in up to four distinct parameterizations, two of which rely on transMap projections (AugustusTMR) and two that perform ab initio predictions (AugustusCGP and AugustusPB) using extrinsic information to guide prediction. AugustusCGP performs simultaneous comparative prediction (König et al 2016) on all aligned genomes, whereas AugustusPB uses long-read RNA-seq to discover novel isoforms. The output of these modes of AUGUSTUS are evaluated alongside the original transMap projections using a combination of classifiers as well as the output from homGeneMapping (Stanke et al 2004), which uses the Cactus alignments to project features such as annotations and RNA-seq support between the input genomes.…”
Section: Comparative Annotation Toolkitmentioning
confidence: 99%
“…In contrast to most earlier alignment methods (Blanchette et al 2004;Miller et al 2007;Earl et al 2014), Progressive Cactus alignments are not reference based, include duplications, and are thus suitable for the annotation of many-to-many orthology relationships. We show how the output of this projected annotation set can be cleaned up and filtered through special application of AUGUSTUS (Stanke et al 2008) and how novel information can be introduced by combining the projected annotation set with predictions produced by Comparative Augustus (König et al 2016). These predictions can be further supplemented and validated by incorporating long-range RNA-sequencing (RNA-seq) data, such as those generated by the Iso-Seq protocol (Gordon et al 2015).…”
mentioning
confidence: 99%